MASSACHUSETTS INSTITUTE OF TECHNOLOGY 
ARTIFICIAL INTELLIGENCE LABORATORY 

A.I. Memo #3T? August 19TO 



REPRESENTATION AND RECOGNITION OF THE SPATIAL 
ORGANIZATION OP THREE DIMENSIONAL SHAPES 



by 



D. Marr and H- K, Nishilrvara 



ABSTRACT. A method ii given far representing 3-D shapes. [". is- based on a hierarchy of slick figures 
(ailed ?-D taodttA where each stie* corresponds to an a*(3 in ttip shape's general lied cone representation. 
Although the representation of a complete shape; may contain many stick figUTei at different levels of 
detail only one stick figure is examined at a lime while the rcpTcscnlation it being used to interpret an 
image. By tints balancing Stop? Of description against detail, the complexity of the camp u tat ions needed 
to support the representation is minimi led The roe? hod requites (a) a database of stored stick figures.; {*>) 
a simple device called the irvzgt-spac* proitisvr for noting between object-centered and viever-cenlered 
coordinate frames; and (c) a process far "relaxing" a stored model onto the image during recognition 
The relatjno of the theory to "mental rotation" phenomena is di£ruM*d h and some critical experimental 
predictions ate made. 



This report describes research done at the Artifici.a.1 Intelligence Laboratory of the Massachuietts Institute 
of -Technology. Support for the laboratory's artificial intelligence research is provided In part by ttte 
Advanced Research Rjojeas Agency of the Department of Defenw under Office of Naval Research 

contract N00OI1-7&-C-O64&. 



t. A method is gtven Icr representing 3 l) r,hapfi lr is based on a hierarchy of StfcV. flames 

(called M> tUPrfWs), where each: stick corres[iohds to in axis in the shape's generalized cote 

representation. 

2- By Uiing Stick f lgWT*i to represent ft shape and its parts at. several levels of detail, a 

representation is obtained that ts intrinsically simple, yet which maintains Its fidelity to art 

Arbitrary level of precision. 

3. While the rcpresenlation is being used to interpret an lmjge h only one Stick figure Is 

examined at a time. By thus balancing Stop* of description against detail, the 'complexity of 

the computations needed Ed support Ihe representation is minimized. 

i The Structures and processes associated with the method are described. The mast 

important are (a) a database of stored stick-figurei, which are indeed JH several ways; (b) 

an iyf\Q%i-'ip<M.e pffctitW, which is a simple mechanism for moving' between object -cen tsred 

and viewer centered coordinate frames; and (da process for "relaxing" a stored model onto 

the image during (he recognition and representation of spatial orientatinn. 

5. Some facets of Ihc theory's relaxation princess, resemble [he computation n( i?Tt rotation, 
but a computer graphics metaphor is misleading. In (act the manipulation! take place on 
abstract vectors (the sticks) that are not even ptesent in the original Image, and it is roughly 
correct to say that only two such bettors are explicitly represcnied at a time. 

6. If the method is taken as a psychological theory, it makes a critical prediction whick If 
false, would disprove It Vteiws of an object in which an important axis of Its general! red 
cone representation is severely foreshortened are peculiarly dif Ficult to interpret. Such views 
are not uncommon, and It Is predicted that this class of views corresponds to those that 
Warrington k Taytor (L9ft) labelled "uFKCinventionar. Their patients should therefore fall 
ot these views, 

7. The theory provides an explanation of most of the experimental results concerning mental 
rotation (hat have recently been discovered hy R, N, Shepatd and his colleagues. The linear 
dependence between time to imerpret and 3-D angular discrepancy is however not a deep 
consequence Of the theory, merely the signature oF implementing it in a particularly simple 
Way. 



I: Introduction 

M some point during the analysis of a two dimensional image of an 
object, the three-dimensional structure of the viewed object and its spatial relation to th* 
viewer must be- established and represented, The question is how? The Form of the answet 
we rcqmre is not a detailed speaf Italian of some comple* neurophyslologieal mechanism, 
although eventually one will wish to derive such a thing. Firtt, we npe.d a more abstract 
Understanding of the computational problems involved, that shows when and how to use the 
various kinds of information that are available from in linage. The understanding that we 
seek may be ca pressed as a method (see Marh 1976a ); it amount to a competence theory for 
Ihis Aspect of 3-tJ vision. 

This inkle presents such a method, and It has four key Ingredient* 
(a) The deep Structure of the thrEe-dimensional representation Of an object's shape consists 
of a coarse stkfc figure, whose sticks correspond tp axes, of the major components of ihe 
shape (such as arms, lorsck head); and of Individually addressable stick figures for each of 
the component shapes. In this way, arbitrary detail can be represented in a system each of 
whose component slick figures is rather simple, yet whkh maintains, faithfully the important 
shape characteristics at each level of description 

Cb) lach stick figure is defined by a prupcsitional database called a 3-D tnofiti The 
geometrical structure of a 3-D model ii specified by stoimg the relative orientations of pairs 
of connecting" UstkS Thus ^he specification is made in a local coordinate system based on 
the principal component oF the shape at that level of description, not in absolute coordinates 
based an a circumscribing: frame of reference. 

(c) When 1 J-D model is being- usEd to interpret an image, a computation must be made that 
relates the geomeTncal relationships among the sticks of the 3-D model to the 2-D 
relationships arffunr; the projections oF thojp stick? in the Image. The computation depends 
Upon the orientation and location of the 3-D model relative to the viewer. This is 
accomplished by a, computationally Simple mechanism called the imagt-jpett precttwr, which 
may be thought of as a device For transforming; a vecr&f between object-centered and 
Viewer-centered coordinate systems, 

(d) During recognition, a sophisticated interaction Lakes place between thE image, the 3-D 
model, and. the image-Space processor. This interactinn gradually rela&es the stored 3-D 
model SO that its axes project onto the axes computed from the- image. Some Facets of this 
process resemble [he computation of a 3-D rotation, but a simple computer graphics 
metaphor is misleading. In face, the rotations take place on abstract vectors {the axes.) that 
are not even present in the original image; and tt Is roughly, correct to say that only two 
such vectors are explicitly ^presented at a time. 

Thus the essence of the theory is a method For representing the spatial 
disposition of iFie parts uf an object and their relation to the viewer. We believe that n may 
shed some light on the phenomena or mental rotation uncovered by R N. ShepATd and his 
collaborators, and on certain neurological findings reported by Warrington & Taylor (1973). 
BathgrQiutd: ii»f>riitrQT {iecQmbostZiQn c/ She TttagnitlQTX pi$£tSl 

Our overall picture of the recng-mhrm problem Is Illustrated In figure J, 
which embodies two points that we take as assumptions. Firsrly, to a first Approximation the 



3-D ma-del representat ion of 3 - shape 



{3-D roodet ca talon) 



{image -space processor) 



Axis-based description of shapes In image 



(Harr t9?6c, Yatan £ Harr 1976) 



ftegions, CO f r esoond! nq in simple imafltt to objects 

t 

Primal Sketch {Jlarr J 976b) 
Ua§e 



Figure 1. Thi* diag: m summarizes our -overall view of the Visual recognition problem, and 
It embodies several point! chat this article takes as assumptions. The f tf St IJ that the 
recognition process decomposes to a set of modules thai are to a first approximation 
independent. The simplified subdivision shewn here consist? of four main Stages, each of 
which may contain several modules. (I) The translation of the Image into a primitive 
description called the promt sktlch {Marr l97$b>; ft) The division of Ihe primal iketch into 
regions or forms, through the action of various grouping processes ranging in scope From 
the very local to global predkMes like a rough type of connectedness; (3} The alignment of" 
an axts-hased description (0 each form {see figure il and (1) The construction of a 3-D 
model for the viewed shape, based initially on the axes delivered by (3). The relation 
between the 3-P model representation of a shape and the image of that shape is found and 
maintained with the help of Lhe I rmge- <pice processor. Finally, the iep resentat ion of the 
geometry of a shape is separate from the representation of the shape's use or purpose 
(Warrington & Taylor KtJi). 



process, of visual recognition decomposes to a set oF modular steps, The evJdehte fatthis is 
extensive but Indirrct. ]c includes, evidence [tarn electrophysiological recordings from 
Adrian (1941) to Hubel & Wjese] (1962. 1965), Barlow, Blakemorelfc Pettigrew (E9&71, and Zekl 
(I97J>; histological and neuroanatomies! evidence from Brodmann (L9Q9) and Cajal 0911) to 
modern studies such as thus* of Zeki (]£7L). Allman. Kaas. Lane !fc Nheiln (1972), Allmin, 
kaas Sc Lane fJ^J), Altaian ft Kias Q97ta. I? St c) and the mass of clinical studies describing 
patients who have l-os-I particular and highly circum sensed functional parti of thell 
perceptual or motor facullies (Critchley 1953, Lurta lOTQ, Vjnken St Bruyn 1969}. Evidence 
against the assumption of modularity in Us strictest form comes from Illusions in which 
quite late processing or hjgh-kvd knowledge About an imae;e appears to influence earlier 
processing: for ejurtlple, shap? rcconjnitjon normally follows f igureground separation, bu( 
can sometimes influence it (e.g. StreEt LB31J. According tn the assumption of mndnl.tr ify. 
these ef fccis should be regarded as second-order interactions between modules that are to a 
first approximation independent (Mirr I9?6b). 

Secondly, we assume that there exists a module {or group of modules) 
thai is concerned wi?h describing The 3-D ihape of an item, and thai Lhis- module is separate 
from the representation of an item's functional. semantics. The evidence for this is a 
penetrating analysis, by Warrington & Taylor fl9tt), who concluded that these two functions 
reside in distinct cortical areas. Patients with left parietal lesions showed disorder* related to 
the use and purpose or an object, but their ability to recognize and represent its 3-D shape 
appeared to be intact. The Opposite w*S ttwe of patients with right parietal lesions. 

This article describes a theory; of the representation and recognition of 3- 
D shape. Some p*rtS Of it, including; the J-D representation scheme and the image-space 
processor^ are precisely defined. Other parts, for esample those concerned with da tan a se 
access during recognition^ are noi yet rigorous. The reader will recognii* that the looser 
parts Of the '.hcury are those that arc closely intertwined with Other modules that we have 
not yet studied, and cannot be made precis* until the enact nature of those modules, and 
what they can deliver from an image, has been defined. We recognize the shortcomings in 
this account that arise for this reason, but believe that in order to rectify them one has to 
have a cleai grasp of a larger portion of the overall recognition process than the particular 
3-D module described here. The theory as described here, together with other work (Marr 
]9%h, Marr I9"*5c. Vaian & Marr I97&, Marr & Poggio I97$a h Ullman I9"?£) summarised 
briefly by fvf arr St Poggio (1978b), repretenls an attempt at decomposing the vision problem 
into modules. Study of the interactions between modules must follow this. 

Gtntwl Harare of tht $-D Ttprrsmtotivn 
Methods for deriving and manipulating the representation of a 3-D 
shape depend heavily on the natore of the representation used. Our first task is therefore 
lo discover which representation u most appropriate. There are four ideas, cutrenr m the 
literature^ the "mulnpie view" representation described by M In sky (I97i). Baumgarr's (19754 
representation by polyhedral approximation, the "generallied cylinder" representation 
proposed by Bin ford (L9T1), and Blum's <\WM "symmetric asis H representation, which is 
similar to the geneTaliied cylinder representation- for t-D shapes, hut differs from it in three 







figure 3. This, figure is taken from figure I of Baumgart (1975), and illustrates his 
representation of 3-D shape i>f polyhedral approximation. FiOm three iMews of a plastic 
rmne, -tr silhouettes (a), (b) and [c) were obtained. A 3-D structure *as zemputeti from 
ttieie SilhuUcUcs. by a cone Intersection technique, and the polyhedral representation of the 
resulting ihape is illustrated in (d). Varloui disadvantage! of this representation, of- which 
the molt severe is LtS lack erf uniqueness, combine to render it an unlikely candidate for the 
psychological representation nf 3-D shape. 



dimension's. 

The multiple view representation is based on the insight that if one 
Chooses one's primitives correctly {«£ the "sJde M D f a cube), the number of qualitatively 
different views of an object may be quite smalt. Minsky (1475) proposed that the 
representation of a, 3-D shape ought therefore consist of a catalogue of the different 
appearances of that shape, and that In is cata r^ue woald noc need to be too large. The 
multiple view representation Hat present underdefined - for example, are all "views" of a 
man the same jn which the same Nmbs are visible but arranged in different positions? ■■ 
and so it is difficult to argue cogently against It. Nevertheless something of a case Against it 
can be made from Warrington £ Taylor* (1*71) findings, The side view of a Water pail is 
very different from the top vjew p and both are reasonably simple (see figure ?). Since both 
views Ate probably equally common, one would expect (he multiple view representation to 
contain and (presumably) to have indexed both of them. If (he lesions of Warrington Sr 
Taylor's, pauents had randomly damaged, a multiple view representation, one would expect 
some patients to have IOS-L one view, and Others, another. Bui (be finding i; that all patients 
are impaired on (he same view (the one from abcve} h vtem that Warrington Se Taylor called 
"utironwennonfll". Although rbe multiple view representation is not absolutely incompatible 
with these finding*, strong extra assumptions are needed to incorporate them 

Baumgart (1915) has proposed Using a system of polyhedral 
approximations to 3-D shapes (see figure J). The mo Nation for this ii that computer 
graphics systems make it easy to manipuhte represent lions constnKted of straight -edge 
segments, and the- comparison between Hie expected view and the actual view of a 
polyhedral structure ls therefore feasible. He mates no claims chat this representation has 
any psychological importance, however, and the features that make it attractive for machine 
Vision tend to make it an unatd active candidate for psychology. Although Baumgart has 
addressed with some success the problem of constructing a 3-D model from several Views of 
an object, he has not shown how to recognue a known model from just one monocular view 
More seriously, there is no -e.il sense of uniqueness in his represent km. A horse shape can 
be approximated m many ways by polyhedra. and there is no guarantee that the 
representations obtained on two different occasions from different set* of views, will be 
homologous. A representation that lacks a strong uniqueness condition will be almost useless 
for recognition. There ate also other difficulties with polyhedral approximation. They 
Include the Eack of any natural represent icn of articulation of parts of an object (e.g. arms 
and legs); the difficulty of answering overall questions about an object, like where it is 
pointing, given only a set of polyhedra ejrh of which describe! some small part; and the 
complex way in which joins between polyhedra have to be specified. As a candidate for 
psychology, this representation at present seems [0 have no particular advantages and 
several disadvantages. We shall therefore not consider it further. 

A gtntntliitd tytindtr is the surface swept out by moving a crosi-section 
along an axis The avis need not be straight, and the crest-section may vary. The 
generalised cylinder representation of an object is obtained by Splitting u up into 
components each of which is described in this way. A gtn&atittd com is a general! led 
cylinder in which the shape of the cross-section remains constant but for smooth variations 







Figure % (a) and (b) show two Views of a water-pail. Warrington ft Taylor's (lfltt3) patients 
are impaired on (b) h but rutf on (a). This is difficult to reconcile with Mmsky's (3975) 
multiple view rep resents ttan. s-in^r both vtewwre about as common. It. is consistent with [he 
3-D mode! represent tion, for reasons that arc clear from {c} and (d). The outlines of the 
original figlucs arc shown as thin lines, and the axis is shoWn il X thick one. This aktS is 
directly recoverable from image. {&)> hut not from (b) where it is severely foreshortened. 
Since the 3-D model representation relies 0*1 *n explicit representation of this a?! Is, the 
successful recognition of views like (b) requires considerable extra computation. 



tn size. 

Agin {1973) and Nrvatia (1974) used a lawr range-f tndmg technique to 
obtain the generalized qyliorfer representation of objects stlih as a Barbie doll, a snake and 
a horse Hcllerbach (1975} showed how contour Information may be used to derive the 
generalized cylinder representation of a wide rang* of pottery, and he found thai the 
descriptive terminology for such artifacts in (he archaeological literature corresponds 
naturally to terms that appear in the generalized cylinder representation Marr (lEOSc) has 
proved (Sal certain assumptions, which ate implicit In the derivation of jhape from cotitOUr, 
are equivalent to assuming that the viewed shapei ire composed of generalized oonei; and 
V*(in &t rMarr (1976) have constructed algorithms for segmenting the monocular image of a 
shape into its generalized <one components (see figure 4). 

Blum (1073) has develop^ a geometry oF shape based on the notion Df 
growth outward from a point. In two- dimensions, hJt representation may be obtained bf 
imagining a fire lit at all point! around an outline. The fire from opposite 'skies" of a 
Figure will meet in the middle, along what Blum calls the figures "symmetric axis" The 
representation consist) of inverting this process, specifying the symmetric axis and the 
degree of growth outward From each point on it. 

For two-dimertitboal shape*, this representation resembles the general ized 
cylinder representation, although it is not identical. For three dimensions however, the 
"symmetric axis" may be two-dimensional, so this representation differs from generalized 
cylinders in a substantial way. Of the two representations, generalized cones seem to be 
preferable because for three-dimensional surfaces they are simpler, and because of their 
intimate connection with assumptions that are implicit in the Interpretation of occluding 
contours in an image (Mm lOTSc). 

The generalized COne representation introduces- two main problems- 
oblaining the axes and the cross-sections at the different parts of an object (arms h legs' 
torso), arid representing ihe special disposing of the component thus obtained. These tasks 
are nearly independent, and this artick ti concerned only with the nooni of lhcm r how to 
represent the arrangement in space of the different cones into which the viewed shape is 
decomposed. To solve this problFm, it is enough to represent the spatial dispositions of the 
axei that occur in an obit's generalized cone representation, which is equivalent to the 
problem of describing stick figures - modflls made out of pipe-C leaner;, r>ne for each axis 
(see figure 5). Such models exhibit *uJy the lengthi and disposition of aKes in the 
generalized cylinder representation, yet we can easily discern the giraffe, ostrich and goat m 
(he figure. That their recognition is so easy makes it reasonable to suppose that we 
ourselves decompose the 3-D fepresen ration problem into similar components. 

II: The Structures of the theory 

The theory consists of a method for determining and representing the 

three-dimensional dispositions of a stick figure's axes for the purpose of recognition, given 
only a two-dimensional pmjecitort of those axes, [t rests on the interplay between the imAge 
and two Other structures: A dalabas* of stored representations of shapes (the 3 1} model s) P 
and ft mechanism for performing conrdmjm- transforms (the Image-space processor), The 



Figure ^. Analysis of a, contour from Vjtan and Marr (I9^6j. The outline (a) was obtained 
by applying local grouping operations to a primal sketch (Marr 1976b). It is then jmoothed, 
and divided into convex: and concave components (h> The outline is searched for deeply 
cohicav* points or components, which correspond to StTOng 5egiwn:.ahon points. One such 
point is milked with an open Circle in (c), There are usually several possible matching 
points for each Strong segmentation point, and the candidates for the marled point are 
shown here by filled Circlet {c}. The coned matES for each segmentation point tan usually 
be found by eliminating relatively poor candkdate-j. The Tesult of doirtfc this here is- the 
segmentation shown in (d). On<:e these segments have been defined, their corresponding 
axel (thick lines) are easy Co obtain f». They do not usually connect, hut may be related (O 
one another by intermediate lines which are called imbedding rtlaUom {thin line* in f^. 
According to the present theory, the resulting Stick figure (f} IS the deep structure cm which 
IntetpMtatlOTi Of this image is based. 



100- 




100- 




Cm 



i i i i i I i i r 



100 



j -r "- r + 

■+S- / * + L 



• t ■ i 



-\ — i — n — r 



+ + 



100 



T — I — r-i — | — | — | — r*T 



100 




100 



fj — i — i — ( — i — i — i — r~\ — i — i — i — i — t - i — i — i — i — n 

100 




I I I I I ! I I I I I I I I 

100 



L 



100- 




tf—r TTTT 



"1 — I — ■ I I 1 — I — I — I — I I I 
100 




-i — i — i — — i — i — i — r^ - 
100 



T — i — i — i — rr 



figure 5. The rhecny asserts chat the 3D leprejencalmn of a shape is decomposed into two 
parts, the description of ihe cross-sections that occur in the .shape 1 * generalised cone 
representation, and the disposition of the anes of these Cones in spate. Our theory deal! 
with the second problem, which is eisenlUUy the ptoblem oF describing stick figures. The 
shapes in these 1 pictures were made 1 out of pipe-:teaners>. The reader will have no trouble in 
recognising the giraffe,, gost, rabbit acid ostrich- That their recognition is so easy males it 
reasonable to suppose that at some stage, we ourselves decompose the 3-D representation 
pfob-km into Similar components. 



Figure 6. Example* of 3-D models and iheir arrangement Into the 3 n moiei representation 
of a human shape. A 3-D model consists of a model axis and Component <txe$ OeFt and 
right figures respectively in host labeled HUM AIM) the latter consisting of a principal axis 
(the torso) and «vera,l aus.llia.rf M(« (the head and limbs) whose positions arc described 
relative io the principRl axis. The complete human J-D model is enclosed in the rectangle 
labeled HUM AM. The 3-D model repr cwn ration Ls obtained by concatenating 3-D models 
for different paru n differem levels of detail This is achieved bjr allowing a component 
axis of one J-D model to be the model axis of another, Here, Tot example, the arm 
auxiliary a*u in the human 3-D model acts, a; the model axis for the aim 3-D model, which 
ttielf has two component axe* r ihe upper anrt lower armi. The figure shows how [his 
scheme extend* downwards a* far at the fingers- 




ARM 



y ■_ 



< *— v^ 



LOWER-ARM 



DISTAL FINGER 



■V 




basic strategy <*f 0UT approach rests on the principle* of least commitment and graceful 
degradation {see below and Marr l9?Gb) so Lhat Lhe method depends greatly oh the analysis 
of constraints that arise at different stages of ihe processing. In this section and 'he reset, 
we give a discursive account of the structures of our theory, and of the processes: that 
employ them. The appendiK describes a particular compter implementation of the theory, 
and gives an example of its application. 

The 1-h mn&tl T€pnit7italtan of thapt 
Our represenia tion of 3-D shape is based on the idea of a stint figure, 
where each stick is thr axis of a gen era I i ted cOne (as defined above) For the purpose of 
this paper we shall limit ourselves still further, to regular cylinders: in place of generalized 
cones, The banc element in the description of shape is called a 3-n morftt and consists of; 
(I) A mtxlfi aril, which provide* a very coarse specification of the general 
size and onenta'.Lnn of the shape. 

(Li) A small number (possibly zero) or fon^.371 r n t axes. The component axes 
consist of a distinguished axis called the ptinapal (jxij of the 3-D model, 
and a number of auxiliary axti. The dispositions of the auxiliary a sees are 
defined relative to (he principal axis, and that of the principal axis is 
defined relative to the model axis. 

(Hi) Associated wiih each axis is a shape description, which in the present 
tesiritted ihecny consists of the specification of a cylinder. 
For en ample, the 3-D model for the overall shape of a human has six component axes, in 
addition to the single model axis feu the whole shape. The principal axis, corresponds, ro the 
torso, and the five remaining component axes correspond 10 the head and limbs that are 
connected to it (see Tign^e 6). 

* Although a single 3-D model is a Simple Structure, several may be 

combined to create a description of arbitrary depth and complexity. This is achieved by the 
(ancai£7\Qtton rult for 3-D modtii, according to which a component axis of one 3-D model 
tervev as the model a km axis of another, By combining 3-D models in thii way, one can 
build up descriptions of a particular physical structure to whatever level of detail is 
required. Such a description is called the ?-D model rtprtifataticn of a physical stiucture. 

Figure & illustrates how model concatenation Is used to creai^ ihe 3-D 
model representation of a human shape, and it exhibits the hierarchy that concatenation 
Induces. At the tup levef is the 3-D model for '.he overall human shape. As we saw above, 
this contains a single cylinder description of the overall shape (based on the model axis), 
and axes for each of the shape's six major components. The neict level of detail contains 3- 
D model.- for each of these components. For example the aim 3-D model consists ■ s I" 1 
model axis, which coincides with [he arm auxiliary axis in the human 3-D model and I wo 
component axes that correspond to the upper and lower aims. The hierarchy ex lends, in 
Similar fashion through 3-D models for true lo^er arm, hand, and finger, and each step is 
Illustrated In figure 6. In this way, a 3-D model representation may be huiff to capture the 
geometry of a shape 10 whatever level of detail is required. 

The Underlying Idea here is that in order to use the S-D model 



represents*! lOfl, the largest unit that h&S to br manipulated at any on* time is small - a single 
3-D model — yet the representation of any whole shape may be elaborate. 

Thus the decomposition shown in figure B should be th ought of not ai 
the process of successively refining a single desf npnnn, but instead- as a representation 
system in which the balance between rfioluhon and extent of description is flexible, and tart 
Change rapidly according to the needs, of the moment, for Instance, one cannot examine the 
fine detail of a hand without Tint reducing the scope of the examination to just the hand '3- 
D model. If the owner of the hand suddenly moves away, the focus of attention can quickly 
be ihif ted to a model near the top of the hierarchy in figure 6 r since that 1; the level of 
description at which movements of the body as a whole are best described. 

We have found the c^de-off b«^en scope -and dft-ail to he a «M?f ul one 
for the processes s-tudied by our theory because the information presided at each level of 
the representation is just that needed by the processes that use this representation 10 
interpret an image. For example in the analysis of a projected human figure, the 
orientation of the torso relative to the viewer is computed using information about the 
anenfauons and lengths of ihe limbs relative to the torso as they are projected in the image. 
This is Just the information that is represented by the human 3-D model. The Same holds. 
true lower down, for 3-D model* or smaller parts. 

The importani overall characteristics of the 3-D model representation]. 
for shape are: (1} the description provided by each 3-D model ls quite simple While still 
possessing the shape information important to the processes that will usa the ?. D model, C2) 
this, technique produces descriptions, that are canonical over variations that are nnr 
important in terms of recognition at least fur the animal shapes examined here; and (3) the 
fidelity of the shape representations produced, is easily improved, without changing existing 
3-D models, by simply adding more 3-D models to the description to represent finer details. 

The Structure 1/0 J-D Model 

The important question for specifying Lhe form of a slngfe 3-D model is 
the manner in whirh the rplative dispositions of Its axes are specified. There are three 
candldaie coordinate systems. Viewer-centered., objett-cemered and local. 

The viewej-ceniered systerr is lhe one in which comparisons with the 
image have event u airy to be made. The image, and hence the projected awes computed 
from it are forced by the laws of optics to he based on a spherical coordinate system 
centered on the viewer. The difficulty wirh this system is that the descriptions produced 
depend upon the orientation of the viewed object relative to the viewer. For example a 
horse facing left produce* an entirely different description from a horse facing right in the 
Image- Minsky's multiple views represemahew accepts this difficulty and attempt* to deal 
with each distinct vie* as a Separate problem. A system based on the 3-D model Idea 
requires that the underlying representation be independent of the viewing angle. This 
allows us to reject a viewei-cemered tourdmite system. 

An object-centered coordinate system Is one In which e*th axis that 
occurs anywhere in the 3-D model representation of an objetl be specified in a 
circumscribing frame of reference based, for ekample, on the lOp-level major iXlS of that 



object- Such a system ts a poor one for articulated shapes where axes are not rigidly 
connected, for example, if op.fi moves an arm h one's fingers u-sually move with it, If each 
finger axis were represented solely by reference Lc the overall body axis, almost any 
movement of a high-level 3-D model In 1he 3-D model representation would render obsolete' 
a\1l information below that level in the hierarchy. 

The natural choice is thereFore to distribute the coordinate system 
making It local to eacli 3-D model The pardon of the finger axis Is Specified relative TO 
the hand* which in turn is specified relailve to the arm, and this, to the torso. In order to 
discover the position of the finger relative to the- torso, these intermediate relations, need to 
bE examined and interpreted. The crucial advantage of local 3-D coordinate systems is ".hai 
!hur preserve the modularity of the 5-D model representation, which in turn enhances its 
flexibility. Using this scheme, it is easy to represent an elephant wjih One leg replaced by an 
automobile tyre, given 3-D model? fof an elephant and a tyre. 

In order to specify the coordinate system For the 3-D model 
representation, it therefore suffices tg desmbe how the Spatial dispositions of the axes in a 
single- 3-D model are determined relative to its principal axis. Figure 7 illustrates how this is 
accomplished. The length and orientation of an auxiliary axis is specified In spherical 
coordinates f rfltJ , irtsJri?n J gijdle, size} or (S, fa 7} where the principal axis itself defines rhe 
unit vector (G, 0. 1.0). The precise position of the auxiliary is determined by specifying its 
origin as a triple in cylindrical coordinates {tmitddiTig-glTiHt, t^tdding-iltitanctj pvitttvn} 
for (.fa t. 1) about the principal axis, Once again (he axis itself is (0, 0. 1.0). For both of 
these Specifications, the direction of the icro girdle-angle, fa has to be supplied en order to 
fix the angular rotation aooUt the principal axis. The set 

-(inf/inadJOTi r girdle, mzt, uribtiiifSng-giTdttj tmbeddtJig-diitariee, pQiitittri} 
specifies the position of one cylinder relative to another, and It Is called an wijufttt relation. 

Figure f Shows the adjunct relation between the torso and left front leg 

■of a cow. The leg starts at (-lOO*, 0.1$, CAI, that is, at the front end of the torso, displaced 

away from the axis of the Torso by the torsos radius and located slightly ventral to the left 

Side From lhat point, the leg axis extends in a ventral direction about ?/3 of the torso's 

length fft)', ISO*, 0.66). Finally, the 'hicfcness of the leg is much less that thaf of the torso 
The angles and. lengths I hat occur in these relations are represented in a 
system (hat specifies bv.h n. value and; a tolerance {table I in the appendix). For example, it 
Is possible to Stale ~hat a particular axis (like the leg oF a quadruped) is connected raiher 
precisely at one ena of the torso, is approximately vertical with about a ren degree tolerance 
out to the side (in girdle-angle), and a rolerance in Inclination of about 70 degrees, Which 
includes positions through which the leg normally swings. 

Tht lwgt-$p8tt P rxtSSW 
We have seen how «ruc:ural information about a shape is held by its 3' 
D model representation in a coordinate system that is essentially distnbuTed. We also noticed 
that information f ro*n the image is expressed in a viewer-centered coordinate frame These 
two systems have to be related, and the mechanism for accomplishing this IS called The 
Imagf-ipQit pTtxt jrer. 



•- - > ? 3 3 ?|=> S. g £ 



3- *■ =* 



m *« 



5 J 



3 I h 

5" 



B S = l ■ 

^ S" °" ^ » e » 

.*J^£™a£ 3 ~_ l - ^ai _ 



o K S » 



3 w 



a* w = a ■» _ ' ^ & a v- 



si. ■ * ■* 

£= * ? S g ffa^ | s § =■ 

5 Si i. i^ 3 >1 = -S 
"T^a — a 3 _ * pr 



a. 



+ 



■ ! ?^ .:■ 



a t- □. "■ -, ■= ^ " IT 1 **i a. -T «■ 

■ . ?* a. ? T ~ - - 



£ 3 Z ■ a -^ ■ ' " ■■ . 









3 ? 3" 3. , t 

j *» S & ™ 2 t 

3 n ,* a * 3 - 

" 3* -■'rS ^ W 



t 'i ^D ■ — *- ■■ ■* , ~. _, , _!, -^j. w ^ jf 

. — i J- O J™ — r i-j. Jt r^ _"■ ^ 

* i 5, fe ^ ~ _£!■"< 

"I * « " J S ^- £>?* I* *■ 3" 

~ S" g- a £ 5 - ff *■ * =■ Q| * 

frS * Q <" " --■ ~*i ■« r» ji " K 

■■a6?»;*;«|||j| 

S 1 ™ ™ n- FT * _. 

?* --. IT- ™ u ^ £' 



»3 21 

n - -.. 



5-- 5- S-n 

IF 



j S. » c t 2 ■ £.* " S f i 

c ~ ?T ^> — ft ^j-3^ ™ T » « 






5J! H 



3 




V- 



en - 



y 



f 



U3 
1 



^ 






i T"i i r f i — r~r 




t 



Since OUT system for rtpl'tseiHIng shape Is based cm >D moduli, each of 
which is limply 4 set of axes organised around i principle aids, the computational 

nvachtnerv. needed in the image-space processor Li very simple. It can be thought of as a 
tabular or simple anihrnetic device that is able Co maintain the representation of a 
distinguished vector, called the Juris, in viewer-centered spherical coordinates In addition, 
the image-Spate processor can represent one movable vector calted the IrptiTiJt (for space- 
arrow). The Important point about the processor is that coordinates for the fspasar an? 
available simultaneous m a frame centered on the viewer and in one centered on the SaaiS, 
10 tha.t specifying the Jspasai in either frame makes. Lt available in the- other, 

The taxis essentially defines a local eoordlnarf system. It is Spec if led by 
its two endpoints. and by one other point that defines Ihe sem girdle-angle. The tspas*r la 
defined by itl two end points. Thus the tmage-space pTCceMOf take* five points specifying 
the tspasar and t&xu in The viewer centered system and produces an adjunct relation 
specifying the disposition of the fspassr relative to the faxii. The reverse transform, alt© 
computed by the image Space processor, tales a Specification oF the laxis and a relalion 
specifying the tspasar relative 10 the taxis, and produces ihe coordinates of ihe tspasaVi 
end poinlS in the viewer centered system. Since the viewer-centered system is expressed m 
spherical coordinates it, 4> r r). predicted projections on the image may be obtained by simply 
ignoring the radial component r, 

An pxample will help to clarify these points, [f the orientation and 
location of the (axis relative to the viewer represents the torso axis of an imaginary horse 
and the appearance of its neck axis is required, the appropriate adjunct relation, giving 1 the 
disposition of the n«k a US relative to the ItirMi SKl-1. U read from the horse 3-D model and 
the image space processor is used to Sflt the Jspasar relative to the (axi* as indicated by this 
relation. This computation produces the coordinates of the fspasar and thus the horse's 
neck axis Jn Lhe viewer's reference frame and its projection is obtained by omitting the 
radial cordon ents. 

Tn the simplest implementation of ihe image-space processor, the taxis is 
a passive element. Rotating it or translating it in the viewer's space-frame requires the use 
of the tspasar to compute its new coordinates, During recognition, two circumstances occur 
that cause one to mo^e the taxis. Firstly, the orientation of a ID model Is Adjusted 
incrementally relive to the 'newer U n III a disposition is Sound where ihe predictions frdrw 
the 3 D model ag'- ■■ be^r with those obtained from (he Image. And secondly, when a piece 
of a 3'D model is to be examined in finer detail, one of the appendages of the model at the 
current level Of study will become "he principal ails for a mure spFciahr.rd model that deals 
with the fine Structure of a sub-part. When shifting downwards to Study the sub-part, the 
taxis anc Its Implied reference frame has to be moved to the new principal axis. For 
example, when using the 3-D model for the overall structure of a man. the lams will be 
bound to the torso, fn order to mo-x to a model for one of the arms, the Jspasar must first 
be moved to thai arm. and the Sax is nay ihen be transferred to the position computed by 
the tspasar. 

Tte Catafagut of 3-D Mt><ittt 

The 3-D model representation of shape. has been defined, and we hive 



seen in principle how the image-space processor relates the specifications found Jn 4 3-D 

model representation to '.hose being- delivered from an image, The third major structure in 
the theory is a catalogue of v.oin'. 3-D rncdeh {see fig u i-^ 8), from which individual 3-D 
model* art freely selected and refined durmg the construction of the 3-D model 
representation for A given physical shape. The catalogue ii indexed in various ways, so that 
incomplete shape information Detained during' the analysis of an Image causes a particular 
3-D model to b* selected, and this model h m turn, aids the further interpretation or the 
tmag«* by pru'-iding constraints on the. possible dispositions, of rhe axes found there. 

The t-D model catalogue may be though! of as a vocabulary of shape 
descriptions, and part of the process of recognition in our theory corresponds to the selection 
Of increasingly specific 3-D models at earl, level of ihe 3-D model representation (haf 15 
being built for the current image Nohlc :hat mating a 3-D model representation more 
specific by substituting increasingly speciallied 3-D models within it is distinct from 
augmenting it wi'h extra detail by adding ne* 3-D models to its fringes. Tn the first case, 
one might for example switch From an overall 3-D model for a quadruped to one for a 
horse; and tn the second, one might add to ihe existing representation a 3-D model for a 
wart in the middle of one flank. 

The 3-D model catalogue is, organlied in a hierarchy of increasing 
specificity. The topmost level contains the most undifferentiated description available, 
which is the 3-D model for a single cylinder h Is the (op-level description of every shape in 
the catalogue, For this paper, we restrict the catalogue to those of a few animals, so at the 
next level of detail, there is a general quadruped shape, a primate shape, a bird-like shape, 
and various limbi These schema are very general; for CKampIc, the quadruped shape 
specifies only that there are six appendages with certain constraints nn their positions and 
dispositions, but with only a very general specification of the types of limbs involved. 

The 3-D model catalogue does not respect the difference between 3-D 
model* for an object and lis parts; its hierarchy simply traces lines of Increasingly 
specialifed description. Thus, 3-D models for the component parts of an object (legs, arms, 
ears, fingers* navels) are also arranged in the hierarchy of increasing specificity, while 
sharing the same top-level description of a single cylinder For example, the hierarchy for a 
limb starts with the cylinder, next decomposes into two Segments (like figure &c). and each 
segment has. its own subdivisions. In addition to this, the "general" {ie. undjf feffnesated) 
hmb 3-D model differences mto forelimb and hindlimb. these into horse-forehmb, cow- 
forelimb. etc. At each te*e\ of Specificity, a 3-D model has internal references to component 
sub -parts -- foT example Jill limbs have upper and lower components - and of course the 
upper-limb component of a. horse-foreleg nnjdtl differs from Che upper component of a 
human -arm model. 

The evieni at this repertoire of shapes affects the eff (Cheney of the 
computations for describing shapes presented to the system, but It does not limit one to them. 
For example, if presented with a favourable vie* of a horse lite that in Figure H, a very 
limiSed system would be able :o construct the description .of its shape without the aid of a 
quadruped modef using only single cylinder models, but it would take more lime than if the 
quadruped model were available and used. On« the analysis of the shape in an image is 



Figure S. The J-D model catalogue contains 3 repertoire, of ihapei orgarmed from the 
general to the specific It it consulted sevfral dmes during lhe analysis of an Image* and 
with Its rip^p a $-B model represeniatiOH of the viewed shape is constructed. AC the top lever 
Is (he most gene I jt I model of alia single cylinder. A" "he nertl leve! are fnodell fOT general 
categories oF shape, Ehoic listed here arc To? a quadruped, a primate, a bird and a limb. Al 
the next tevtl of differentiation, specific lypw of these general categories, arc represented 
The constraints imposed, by using a model a< one level in the catalogue to interpret an 
image. Often give sufficient nem information Co enable one to select correctly a more 
specialised rr-uu> I The organization exhibited in thii figure Ls Orthogonal to the 
organiiatinn depicted in figure 6. 



CYLINDER 



LIMB 



THICK-LIMB 



U 



TH»N-LIMB 



guADftU^ED 



fc 



COW 



& 



I 



HORSF 





GIRAFFE 




PRIMATE 



o - ZDM(=rrD 



HUMAN 



& 



:.! 



■% 



PRIMATE 



As 

DO 



BIRD 



% 



s — ■?=> 



OSTRICH 



--'■ 



^ 



kO 



DOVE 



I 



accomplished, Ihe newly constructed 3-D models can be assigned to the catalogue as new 
mrodeb to be used to help mterprd subsequent images. This step Lovo1ves a considerable 
amount of indexing. ■ 

An important feature of the l-D model catalogue is the extreme 
Flexibility with which individual 3-D models may be used -during the construction uF a 3-D 
model representation for a given image. This is of course essential during (be protes-S of 
recognition, where the description! of the different parts of an object evolve independently 
to a certain efclent. Fot example, one might V R particular instant be- using a quadruped 
model, with rath ft general associated leg. neck and head models supporting the analysis- 
The constraints supplied by the head model allow a sufficient amount oF new information 
to be obtained from the image so :hat the newly specialized description can he used to access 
the particular 3-D model foi a horse-head directly vm the catalogue's indexing mechanisms, 
This then StHows tlie developing representation to be further specified both through 
Improved special izatiu" ut the 3-D model selected for the whole animal's <iiapp. and. through 
Improved specialisation of (he models for other compooents of the shape such IS the head 
and legs. 

III! The processes of the theory 

We have seen how 3-D shapes are rtpiesented. and the mechanisms by 
which tb« representation n translated into quantities that may be- measured from an image. 
We now turn to the more dynamical aspects of the theory > and these fall into two parts. 
First, how does one select an appropriate 3-D model given only the 2-D Stick figure derived 
from an image? And second, having 1 obtained a candidate 3-D model, how does its f rait* of 
reference come to be specified accurately relative to the viewer's? The basic strategy of our 
approach uses the principle of least cornmllment (Marr I9%b), which stores that nothing 
should be done tha: may later haV# to be Undone. At each stage, action Is based on 
Information and constraints that are reasonably certain, and it designed to produce new 
information and fresh constraints that will help to guide the analysis towards the desired 
goal 

This pan of the theory ls only ou timed; in fact it lies almost outside the 
3-D represeniatlon module, Since information from many other fnodules and interactions 
with them play ao unavoidable role in rhe analysts of any but the simplest images. 

Ti\t hup homology p wbfmt 
l- Atetulnga antabfe J-D modti 

The first problem is how to obtain a suitable 3-D model. The database 
contains „ large store of them r and we have to use information from the Image to select one. 
The stored 3-D models range in specificity from the Very general to the very particular 
(from a single cylinder to a giraffe), so that accessing the 3-D model database with a given 
set of features would in general cause the iniescr to return many possible models. The 
principal of least commitment implies that one should never use a model that is more 
specific than current I no* ledge warrants, so it is inappropriate to index: very spec i Fie models 
under very general attributes. Hence the access paths in the database behave more like a 



a, 




3-D MODEL SQUADRUPED 

Cqueqcleni Axes 



1 


- 


Jtorso 


? 


- 


$EUST 


z 


- 


$LIME 


: , 


- 


$LIMB 


■3 


- 


$LIMB 


b 


- 


*LIHB 


7 


- 


ITAIL 



Figure 9. The homology problem*. Previous vj«ia1 process deliver a dalastrutture like 
that exhibited in U), where each axis li associated with a cyPinder width, and the 
connect] v it if is explicitly available. The first homology p rabbin i s tD se | Kt a suitable 3-D 
model from the catalogue. The result of the computations carried out here | 4 the assignment 
or a quadtuptd S-D model to this problem. Next, a homology must br established {so far as 
is pouihk) between thr a«es Ln the image and the componem ases of the quadruped 3-D 
model The result of this step iJ shown in {&} At this point the viewing angle is Still 
unipptifiPfl. and only ralher general information his been used to establish the homology 
* i th Th li unspecia 1 1 red 3-D rnodet 



decision tree rtun Ih-i-y would if every Iftetn were Inriekfit independently. Olitc a generaj 
model like a quadpjprd has been retrieved and used m describe the (magp. it form* a local 

contexi through which more speclalLi.ed features of that model can access more spetuliird *- 
D moduli indexed under it. 

Suppose that on* is presented with a sfLck-f Igwre image lite thai in 
figure EX. To begin w«h, nothing Li known about ihe perspective from which the object is 
being- VLewed, so the initial 3-D model must be altered, "Sing information thai is preserved 
by perspective transformations, Connectivity is not destroyed by perspective 
transformations, not a If quantities like (he fractional distance down one axis at which 
another connects to It. unless the object is being viewed from very dose by. Spun™* 
Connectivities can be inlroduced if on* an is crosses in front of another and if the reason IS 
not recognized lower down, but existing connections cannot be destroyed^ only obscured. 
Hence in order 10 US* connectivity informal ion. when measuring which database items best 
much a given configuration ier, unexplained errors of omission are treated much more 
seriously Ihm unexplained errors of commission. 

Thh ^it.ikL r oil uf iriluvMiiiin \- i, n dk anr;lcs, -n«. Iin .rm:i\ ."-md llic 
relative lengths of axes. H is easier to take advantage of these later on, when the image- 
space processor has delivered 3t least partial results about The three dimensional Orientation 
relative to the viewer; but it is possible to do something with them early on. This comes 
ab-OUt through weak, gross clues For example Lf the 2-D length of the "neck" sign if Lcantly 
exceeds "he apparent length of a the "tot SO* in the image, and if the torso dMS not seem 
abnormally fcrshottened when compared with the length of the "legs', the image is (IMy to 
be a giraTfe. In Other words, lower bounds on [he lengths of limbs can often l>e Inferred, 
and are sometimes useful. Another important type of due concerns major difference! in ihe 
girdle-ancles of two axes that are connected to a common one. For example, the necV and 
the tail often point in very different directions - one up and one dnwn - and this- obvious 
difference can usually be seen without a sophisticated 3-U analysis. In a pipe -cleaner 
animal, this very rough difference can help to determine which end of the animal is which. 

The imporiam point about the initial index access, and all subsequent 
accesses unlil an adequate description has been built,. Is that the newly sekcied model ls used 
to itrucfure information <hat is already available and is instrumental in obtaining fUTther 
shape infotmation from the image. This Added information is then Hied to select a more 
specific model, and the process repeats itself Unlil enough information is gathered for the 
purpose al hand. 

The path to a &-D model is not always AiitV. When an important stick 
in the sticK-f igure Li foreshortened and component shapes are insufficient for determining 
the 3-D r^el, Other imds of st:a:egi« ,\ie nefded An interesting sample is a water-pail 
(see figure 21. When seen from the side, the image of a pail segments naturally into its 
■generalized cylinder description in which the pall is represented as tlie slice of a cone and 
the axis js vertical (figure 2c). If one looks down from above however, one essentia 1 1> Je« 
two circles joined by the sloping Udes. The principal axis of the pail would appear as a 
point from rnis perspective (figure 2d), and tf the paifs handle were missing OT only vaguely 
defined in the image, there would be no strong component dues to work with. 



In OtckT to access The correct 3-D model despite these obfuseations. some 
idea of depth has to he introduced ifl-tQ the analysis btfwt addressing the 3-D model irir.5ex 
can be successful In !be case of the pail, some ptosis has to realise (hat th* [wo circles 
might be separated in deplh, and that if ihcy a^e. they could be separated by a considerable 
distance. The clues that signal (his in monocular Imager include radial symmetry and 
nuances of shadow and highlight, which leads uj to expect thai much of ihe analysis- of 
lighting and shadow tan influence the processing at exactly this stage of recognition. We 
Lhink of the coniLHuauons that tale place her* as deploying the ftspaSAT to construct from 
the 1 image * primary 3-D model, that Consists at first of an axis in depth whose 
circumscribing surface is bounded by the (WO visible circles, and to which estra details - like 
hollowness, the closure of one end of this surface by. an orthogonal plane, and possibly the 
addition of a cross-strut to account for the handle ■ are added. At some point during the 
construction of this description, the indexer l* successful at finding a match with some near 
antecedent of the bucket ^-D model in the catalogue If an "unconventional view" becomes a 
common vie**, it would beenm- p-c-f liable lo index the appropriate 3-D model under (he 
special features that obtain for I hat view. 

2: Matt hi ng thr image to a mtfti 

Once a 3-D model has been selected, Its component ajies must be paired 
with JOCks in Lhc slick -figure image Since the ways in which a 3-D model is selected vary 
Considerably, the association between these elements is noi always automatic. Often h some of 
the association* will remain ambiguous For example, imagine the Silhouette of a horse 
from the side: the legs are easily identified but the left and right forelegs cannot be 
distinguished without further information. Whil is important in many cases IS that a 
particular stick from the image is one of the legs, since the legs arE roughly para He! and it is 
Eheir orientation richer than their specific identity that is important for Computing the 
figure's shape 

The information available for making these associations increases as the 
processing proceeds Initially, positional information along the principal aKts of the stick 
figure is depended upon most heavily. Ofien, clues that are available at this stage include 
the relative thicknesses of the shapes round rhe stick axes (the neck of a horse is much 
thicker than the legs), and the decompositions of component sttcls {the (ail and Irgs of a 
horse may be roughly straight, but (he bust has two components that always mate a large 
angle with one another). Symmetry Of repetition can also be Important for disambiguating 
the components of a itjck figure. For example the legs of a horse are all the same thickness. 
are roughly parallel, and because of this h»ve roughly the same length in the Stlck-figUre 
Jmsge h distinguishing Eh em from the tail. Also the legs and tall are usually on one side of 
the torso while the tuist extends to Ihe oiher side in the image of a horse. Col lectl very, such 
clues are ofim sufficient to disambiguate the major components of a 3-D model. 

Re/a xation 
"The flhal part of the theory ajiumes that the image has been described 
by a 3-D model with which a homology has been established, and describes how the model 



Figure IQ- Relating a Stored model onto Lhe JticVTigure derived ftom the Lma[re. Once % 3- 
D rnod-fl h*s been selected and associations hal/e been made between the axes of the model 
and (he stick! computed from the image, the approximate orientation of the model relative 
rn rtip viewer Is uiinjiu:^ ■■■: a hill climbing algorithm using rlie image spacj processor 
Thii prater is carried crtJl With the taxis positioned JO that it* projection coincides with th* 
iticl associated with the- model's principal *Kt* (as indicated bv the double- headed artowi 
above) Wi:h this arrangement, the appropriates or a proposed principal axis orientation 
can b* judged by usintr, the Sipasar to compaie :b< 1 consequent projection* of the model's 
limbs with the associated Sticks in the image. The taxis can be retail in two d (mentions 
without muvmg its projection away from jits assigned stick in the image. It can be dipped 
toward or away from -he viewer and it can he Totaled about ]U own ajfii. In the figure* 
above, dark lines indicate Sticks computed from lhe image And light linel are projections 
computed using the fap.isar. The top sequence (a), (b), (c) shows the projected axis of the 
quadruped model for different rotations about (he taS Li whtle its ends a™ equidistant From 
the virwtr. In the lower sequence (d), Or}, (F) the tall end of the taxii is moved slightly 
farther away from the viewer. 



f , 1 , } r^-r^ 




Flg'tif? It View.', of an object in which an important axis is foreshortened arc surprisingly 
common, Fron> only on* of these views of a camera (c), may its- two main a ices be recovered 
itra kg hi [forward 1y frotn the image. Figures (d) through (f) show hov this happens., by 
dis-playing the ax« for *ach of the views (a) - (c) within » line drawing of the overall jrupe. 
Views (a) and (c) Fall into the same- class as the top view of a water-pail (figure- 2b). 
According io the theory, the class, of inch viewi provides a, r Igoroui definition of the 
Intuitive nation of an "unconventional" view (Warrington £ Taylor 197 J). 









4> 




comes to possess ihe appropriate 3-D orientation relative to the viewer. This is acoomphshed 
by an incremental hiEI'drniblng procedure *-hlch uses rhe image-space processor and 
information in the 3D model re match tin? model to the axes denied! From the image. 

The basic idea here- is to use the image-space processor to compute the 
discrepancy berwtru a given 3-D model otienia lion and the constraints impeded by the stieV 
figure image The lax is ls s*t to an arbitrary initial orientation (slightly approa t h i tl g Or 
receding from the viewer, based perhaps on shading cues) SO that 111 projection is parallrl to 
the axis of the Stick-figure image. Two degree! of freedom are leFt unconstrained at this 
point, the dip of the fast* out of the image plane towards or away from the Viewer, and the 
unit Vector associated wjin ihe Saais hvhich determines the rotation at™* Ihe taxis of the 
object's local coordinaie syslcm (see figure IQ). From a given disposition, the discrepancy 
between the 3-D model's projected component axes And the conesponding sticks of the 
image can. be computed using the image-apace processor, and Iheir sum gives an indication 
of the goodness oF fit oF this particular orientation of the Jaxii A simple incremental hilt- 
climb Lng technique map now be used, that varies the dip oF Lhr fasis and the rotation about 
it until a, suitably good Fit Ls found. Further discussion of the process Illustrated in figure 10 
may be Found in the append iX. 

This technique is incomplete as h stands, Since the orthogonal projection 
of a stick figure looks ihe same regard lew of whether its head is nearer the viewer than Its 
tall- For animals llle a hone, this ambiguity may be resolved by notLcitlg whether the 
forelegs or the hindlegj are shorter. For less Familiar objects, obscuration or context clues 
^what the object is on or In) arc probably necessary to disambiguate Lhc Lwo possibilities. 

Flnalty, comparison with the angles of the image are only a partial* source 
of error information in the htir-climbing computation. Used alone, they would rruVe the 
computed disposition of the lax is :oo sensitive so slight variation* in the dispositions of the 
component aKcs in Lhe image. We therefore include in the error calculation discrepancies 
between the dip oF the Jjpasar away from the viewer and the dip computed from the image 1 
using perspective inf otmadon (does the circumscribing cylinder thicken it the nearer eind as. 
It Should?), and length Information (for this orientation oF ihe Sspasar is its projection too 
long or too short compared with ihe image?). Our grasp of this part of the theory is 
adequate only for simple images, and we shill develop it further elsewhere. 

IV; Discussion 

The discussion falls naturally into two parts, one concerned specifically 
with Vision, and the Olhet with Lhe organization nf information in a wider sense. 

I: 3-D TtprtW1\tati&\ thttoy 
There ire five main points to our theory. They are; 

(1) The 3-D disposition of an ubjeel is represented primarily by a StLck-figUrc configuraEion, 
where each scicl stands for one or mote axes in the object's generalized cone representation. 

(2) This configuration is described by a loosely hierarchical ASSerl renal database, called a 3- 
D model representaiion. Use of this databa.se is eKtremely free and flexible, and it can 
support levels of description that cover the spectrum from very coarse to very fine detail: It 
also satisfies the principle of graceful degradation, which States that partial information 



should yield partial results. 

ft) In order ta be useful, this database has to be Interpreted through an (essentially} 
analogue mptfhansS'-t, called the image-space processor. In its minimal Implementation, this 
processor can be though! of as maintaining the representation of one vector in a local S|>Aee- 
frame, 

(i> The image-space processor's instruction Jet is small. Its most important features are. 
{%) trie ability to mterpTet an adjunct relation between the lasis and the tspasar: and 
<b) the ability to relate objfCl-ceniered coordinates to a viewer-centered frame of 
reference. 
C5) The image-space processor can deliver information about ihe lengths and Orientations of 
the appearance of the faxis and fspasjr. These hftp the system CO 'rotate" its mode! into 
the correct 3D disposition relative to the viewer. 

The immediate and most accessible prediction that Follows from the 
theory concern* the chaTacteriiation of Warrington & Taylor's (IS^J) "uftcOfivenliOnaT vlewi, 
According to our theory, the most difficult vie*.s to handle are those in which an important 
axis is foreshortened. Since in these t-ascs s:iai[;ruFor*ard segmen tat ion Tails to recover them 
from the image. We therefore predict that these ire the views that Warrington Be Taylor 
would label unconvenliOflal, and Ofi which their patients will fail most easily. 5uch views 
are by no means uncommon, and figures 2 and II contain iwo familiar examples. 

U is hard but not impossible ta derive detailed nf iirnpltysinlogical 
predictions from the Theory, particularly pn=d If n nn s about the likely implementation of the 
image-space processor fNishibara, in prepa ra 1 1 on ). There are however several general 
points about The theory that lead US to take it seriously as a model For psychology, and 
which therefore encourage us to derive more derailed predictions. They flie: 
{]) F'ipe- cleaner animals are almost as easily recognisable as are line-drawings of animals r 
despite their very abstract relation to the original This would not be MJrpTtilng if pipe- 
cleaner animals were in some sense *K traded from the image during the normal course of its 
interpretation {a* OUT theory asserts), but it would be surprising iF not. 

(2} The loosely hierarchical structure oF nur 3-D models, has many compu taiionil advantages 
thai are almost bound to be shared by the psychological representation, even if the 
psychological represen ration is Otherwise veiy different. The advantages include a variable 
level or detail in Ihe 3-D model System,, and the Flexibility with which different J-D models 
may be accessed and combined to form new modeks. If a system has 3-D models for a hoTse 
and For a man. it will be able :o build the description oF a centaur. 

(3) An important part or the theory is che simplicity of the image-space processor. The only 
requirements are that it be able tcj manipulate one vector in a -space-frame, and relate the 
specification Ln that fiame to one in :he vitw^r-teritertd frame. By usin^ Lhe itick-f hjure 
representation, the essentials of the spslial organiialLon o!" a shape may be manipulated at 
very lu<v lumpULalionaL cost. 

(4) The mechanisms of the theory can handle 3-D shapes, and $0 are inherently powerful 
enough to describe 3-D patterns, such is th? ronfiguiaticm of feanires on a Face. The only 
tequ irement is that luch patterns should be described relative In axes that are -constructed 
wlili in ihcm, since ihe structure of a 3-D ffctitl depends on specifying positions in This way. 





—1 


- »■ 

s 




V 


2 


a 

■.-■; 


z' 

Li. 


■^ 


;*■ 




J 










■-■* 


V 






zr 
i 


5 


■" 


3 
--I 


d 




JO 


i. 


j 


J 


T 


su 




-i 




? 






3 

- . 


■b 

!■■ 

■-i 


- 




c 

EH 


6 

— 


■a 


■11. 

H 


- . 

Z> 

Zl 

s, 


z: 


"0 

J_7 


■J 

z 

— 1 




r-. 


Z" 


3T~ 


-n 


. * 




zr 


""! 


V 


■1 


Ed 


=" 


C; 


c 


«■ 




3 


rm 

- s 
Zl 


- 

!■■■ 


■•i 

e 




ZT 

— 

Zl 


— 
en 

z: 


— ■ 


■ 




-i 


ti-iK 




< 


VJ 


*H 


m 


» 


u 


= 


u- 


"C> 


-_n 


t — . 


— n 




^ 


^ 


— r 


Ul 


■ * 

_5 


T 

M 


- 


^ 


■ ■ . 

-J 


3" 
Z 

zL 


3' 


§ 

— t 


5 
r-3 


5 


-d 

T3 


— . 

— ! 

rZ 


3 

& 
2_ 


- 

— 

Z~ 


3 


■ 
•i 

■: 


■ii 

"3 

2 


■-- 

1 - 


^"_ 




-Tl 


Zl 


■Z, 


'--■ 




3 


J" 


—n 


3 


IK 

-1 

1 


c" 




t 

■* 


■t, 


-l 


to 


1 




M 


zr 
■t 


o 
r 

V 


Z' 

z; 




N 


"V 




■- 
1 

e 

I 




i 1 ' 

3 

-_ 


In- 

■n 

-■ 
-i 
•t. 
L-i 


: 5- 

3 
E 




—i 


:■. 
1 


y 

__ 

. — 


Zl 






c 


=J 




■^i 










IT 

o 

Zl 


H 

zf 


- 


5' 

IT 


9 
it 






(i 


S 


n 


■^ 


Zi 













It il therefore important far the theory thai axes be established early in our perception of £- 
D Figures Figure 12 provides positive evidence on this point. In the top row, the shapes 
are S*«1 as squares, whereas, along the diagonal, thef nt seen as diamonds. The diagonal 
axis IS thereFore being constructed during (he analysis of thii pattern; it influences, and 
therefore probably precedes, the description oF the shapes of the local elements. 
{5) The theory/ has been implemented ind works well for Simple images (see ihe appendix). 
Mmtal TotaltoR *xpertmtj\ti 

Tn N~!y Shepard fr Metiler OSftl) created a set of images by rotating and 
reflecting simple objects made of cubes {figure IS). They found that the time taken to 
-'I ii-'.' whether t^c: fnt-i inrus^ w^jf of IctentLcll objects rather than objects that tiifrered 
oy a reflexion, varied linearly with the angle through which ore object must be rotated w 5- 
Space to become aligned with the Other. This finding revived interest in "mental imagery - 
and in analogue processes in perception (Cacper & Shepird (19^3), Metiler Sc Shepifd (L&74>. 
Shepifd (I975J) In addHicn. Kosslyn (IP75) has published evidence for am analogue 
component to the processes that interpret mainly two-dimensional structures, like faces and 
maps. 

The Significance of such experiments is controversial {but not trip 
results). Part of the reason for the controvert seems to have been some difficulty in seeing 
how an 'analogue" process COUkl benefit the computations that underlie percept ion and 
recognition. W> believe that the piesent theory shows a way In whkh such a mechanism. 
could be useFul. It asserts that there is indeed an analogue component to the process, namely 
the image-space prnce^or, .and that it operates on the sticks in a 3-D model. The Imparity 
that Shepard rt ai regard as significant is however not a deep COnsequente of OWr theory, 
merely Lhe signature of one particularly simple way of implementing il. In the language of 
Marr & Poggio (]9"Ffib). the linearity is a consequence more of the mechanisms that are used, 
(nan of the underlying nature of lhe computation. 

BTuacMy speaking, if our theory is taken as a psychological modek jr, 
predicts three stages, in the assignment of 3-D orientation «to views that are not 
unconventional. The stages are. (a) A startup period, during whkh thelites are obtained 
from the image, the 3-D model database 14 accessed, and the two homology problems are 
Solved f.b) An incremental process, during which the stored 3-D model is relaxed onto the 
axes being delivered from the image. This process uses Che principal axis together wilh the 
two or three Other must suitable ones, and in its simplest incremental implflnenTjumn the 
time for relaxation will vary roughly linearly with the 3-D angie through which the Stored, 
model's space- ft ai-^e ii rotated. (c> Finally, when the best "3-D oiLf-n ration h»i been found, 
the remaining axes in (he model arc bound to the lOHge, and fine adjustments made to 
their positions and sizes, 

The same computational theory certainly has other equally viable 
implementations that do not exhibit a linear dependence on she angle. In one of these 
implementations, the angk through which lhe model's frame is rotated at each increment is 
half the angle hetween its present posiUon and the currently predicted final state. In Ihis 
implementation* the lime to settSe would vary with approximately the logarithm of the 3-D 
angle. Such a system does not have so starkly simple an image-space processor as the linear 



Crte. but its reqUiremenls an: slill modesi relative ID whit a digit* I electronic computer can 
provide. It must also be borne in mind that ontess the subject is very famillfir with the 
object* being recognjied. the interaction between the image, the image-space processor, and 
the 3-D model database may be -extended arid complex. In such Cites, any linear dependen ce 
on angle could be masked completely by the process oF accessing successively more detailed 
3-D models. Thii is particularly Iru* if the subject li presented with art unconventional 
view of an unusual en unfamiliar object, an expectation ihat suggests several experiment. 

If one bens this caveat in mind. however, only on* of the findings 
reviewed by Shepard {39?5, i!tm H page iOO) is unexpected. Et comes from Cooper fc 
Shepard (ISttSb condition Ok who showed that advance information giving the orientation 
but not the identity of the object to be presented is not suFFicien* to enable subjects to 
prepare for it. One might have expected (hat subjects could rotate iheir l**is to the 
appropriate n epr-.-.r.- --md leave \*. :h r .'c to br bouni to :hn principal a?;i5 ot a '.'-![> imirlH 
when the image was presented. In order to incorporate this finding, we would need to 
Assume (fOT example) ihai rhe image space pmcwsor cannot he run unless bound to a 3-D 
model {even if only of .*n arrow-), and that whenever the taxis v% Tf bound to a radically new 
3-D model, the image-space processor is reset. There are some other grounds For warning 
this. The space-Frame in the image-space processor needs more than one direction to define 
it, and frying to conduct a space-frame round a given vector can lead to problems if the 3- 
D model is not simple. Secondly, in the real world, one rarely see* two object.? at the same 
point in the Field Of vi€w Therefore, to Change to a new 3 D mod^l almost always requires 
a change in the direction of gaie. Jn order to compensate For this in a minimal 
Implementation, the taxi! and Jjpasar would have to be set to aicei in the starting frame, in 
Order lO ■carry out Ihe primary rnr.irionr thai allow For i he angle nf gaie. These arguments 
are hoover *'es*er (ban the arguments thai support the rest of the theory 

Before we leave the discussion oF the Visual aspects of the tbeoty,. it IS 
appropriate to note that the 5-D model representation is not without its disadvantages. 
Firstly ir is hn-Td on the structural axes a* a shape, and some attempt at cxlractihg them 
musi be made before ihe mechanisms of the theory can he invoked. To do so requires, a 
great deal of pre-processing of the imajje. and the theory associated With this Is only 
beginning to be worked nut (see Marr &■ Poggio I9*?6b For a bneF Teview). For Views, in 
which a structural auk is foreshortened, this pre-processmg may he completeh/ unable to 
deliver the correct ax.es. On such views, a system thai operates according to the present 
overall theory wih oe severely disadvantaged. It ix not clear whether other methods, exist 
(hat wouid be more successful, 

Finally, the criticism about the absence of uniqueness, that we made of 
Baumgar''s system Tor the rcprcscnlatLori of shapes by polyhedral approximation, sometimes 
applies to the generated cone representation, For example, consider a doorway. The 
nalural aais oF most doors is vEflica!, because they are higher than they are wide. Thlt Js 
not always true, however, and it is perfectly possibly to represent a doorway by an axis 
parallel to the width of the door, or even one parallel to its thickness. For most purposes, 
there is little difference between using the height and using the width as the principal axis, 
but using Ihe thkkn«s maj introduce an important new way of looking at the space the 



door occupies since when arranged in this direction, the Sspasar carries Information about 
the direction "hat is involved rn passing through it. In other words., the analysis and ow of 
holes may depend to a considerable en lent art using the Sipasaf to define what "through " a 
hole means. Moreover, we feel that many of Che prohlemi or representing and mini pula ting 
the space immediately around I he viewer can be handled conven Lenity and efficiently using 
a mechanism lik-p the im^ge-ipace processor. 

2: Bj&tdtT iitwti tmttrniyii tht rtptttmlatitm af kwtitfcdg* 

Following the trad h ion of Harriett {IM?}, Mlhsky {197b) observed that the 
"chunks* of reasoning, language, memory and perception aught to be larger and more 
structured than most theories in artificial intelligence and psychology allow. This idea it 
much more attractive than il it easy to realise, and two factor 1 can be identified as mainly 
responsible for trie difficulty. The first Is what are the chunks? To answer it, one must 
know how to represent a piece ai' knowledge for the purpose at hand, and much wort in 
artificial intelligence is devoted to asking this question in different domains. Sometimes It It 
answered with conspicuous success (Moses ■(1974 MACSYMA}. Shorthffe (I9">6 MYCIN), 
Duf field i-r of. (1969. DENDRAL}. SuHimin * Srallmm (1*75 EL». 

Trie second factor 1i ihe quesiion of flexibility. If all one's knowledge 
resides in canned chunks, little room Mrr>aLni for variMloru in a scenario that are inevitable 
in each Of its real world instances. This factor causes particular d Iff itutties in domain! that 
are ambitiously near to real-world Situations, 111* Schank's. (1915) restaurant scenario, lit 
effect is to leave these scenarios unable to deal with irregularities. 

In the present theory, we propose that trie central description of shape IS 
bated on the 3-D model Teprewntauon, The desired flexibility is achieved by modularity 
within the representation, which allows 3-D models to b? combined as the image dictates, 
and by using the 3T> model catalogue more as an aid to building the cut rent description 
than at a set of inviolate SUbunitS that must be assembled unchanged in a rigid way. 

The othcT point ihat we believe may be important about the theory it the 
way it embodies Minsk f\ assertion, Lhat Lhe Overall StriKtUt*. of a Situation or shape ii of 
impottance to the way its details are recognued and Iheir organisation represented The key 
idea here is (he use of coarse overall dtSCTlptiOhtof a Shape to help extract new information 
from the image, which in turn enables the 3-D models involved in its descilption to be 
specialised further so lhai yet more can be read from the image. Thus, 3-D models for the 
overall structure of a shape sel up » <imiM>.r «f spinal constraints, between otherwrse 
unrelated axes in the image, which then allow specific local "deductions" to refine the details 
" possibly causing the overall description to be abandoned. This proces) is directly 
analogous to the situation in Susjman & Stallman's O'STS) program for understanding 
electronic circuits, where a "high-level" description llie "voliage-divLder" becomes attached 10 
part of a circuit, relating components by local laws thai ate special and Informative, and 
which allow consttaniis on ihe behaviour of that part to be stated accurately and concisely. 
In these two domains, these phenomena wm to rapture the essence oF what mates Minsky's 
(19(75) article so stimulating, although we. feel that the interplay between different level! of 
description, which forms a crucial part of the computation, hat fvl to receive a satisfactory 



genml 'cumulation I" any esse, the Important ftMlurc oF Ihese two ena.mpl« 1i thai Lhcy 
Jpeufy precisely the information contained in the high-level descriptions. Discussions that 
consider only pqssihle impkrnentaiLon micAantim {/tames* semantic networks, property litis, 
Conmver Jin-eth&dl, atlors etc.) are not useful for deciding how information ihould be 
represented in 4 Fresh domain. 

The explicit nature of these high-level organiimg structures (the 
quadruped, the voltage-divider) standi in ihaip contrast to method! based on -cooperative 
phenomena, like the iteteopS-is theory oF frfarr fr PoggiO (1976a), in which the higher-level 
■holistic"* organising- structure of Ihe compU - alioh remains an implies, not an explicit, aspect 
of the network by which it is implemented 

There may he an interesting connection between the specific database 
organ nation, that is required by our theory, and rt recent study of human semantic memory-. 
The organitation that makes tt possible to carry out the construction of a gradually mnre 
specific 3-D model representation is the ordering of the 3-D model catalogue by increasingly 
specific inapt. Thus the access sequence for 3-D models during the recognition of (say) a 
mallard -shape would ofrm he approximately: 

tmnH-blob-ihapt -> hird-sha.pt -> dlttk-th&pt ■-> mallard-sfia&t, 
Thll provides an interesting functional basis for structuring the 3-D model catalogue 
according to rules very similar to those exhibited by Warrington (1975), in a recent and 
ingenious study of the structure of semantic memory, 

Finally, we feel Shit a Simple mechanism along the lines of the imagc- 
jpace processor would he of grea,t benefit to a motor control system. At some level, a motor 
system must have access, to a representation or body-Spate in which diifcances h directions and 
trajectories arc computed and stoted in a foim closely related to what visual information ran 
provide. Yet to execuie a motor action, the commands must eventually b° touched In terms 
Of lengths,, tensions and joint angles A mechanism afong r>u? lines Of the image-space 
processor could provide a link between the two. at low computational cost. 



Figure H. This figure CKhlMs the informal ion conialTied in 3-D models currently listed In 
the restricted 3-D model catalogue used by our present Implementation. Each 3-D model, 
referenced hy its {name, has an associated width and a list of relation I among Its component 
axei. This Hit of rH.ition.; .;p<>cifie-5 the relative spatial dispositions -of the component^ and 
Indicates a 3-D model for each one. The accompanying stick figures show the appearance 
of these components relative to one another from a particular vanlage point 



(J CL <J (/> Tl 

'Z # "* * « 

^ 3 E -* 

-3 □ _, m to p. _ 

fl » S QJ H ' 



2- 3 
p 5 



^j Qj 



j 3 is. 
3 - y ■ 



S " s 3 s -!: 

* =r ^ S ^ ~ 

■a "^ 



to a b n 3 m -■ 

s 3 3 N 3" = o 

* * 3 i^ i ; 

r „ 5- li ^ 3 S 5 

£ 3 S * £ w 3 

* *■ 3" "si « I * 

T — i -1 ■■ w =* 

* 43- 3 £ m 3 re- 



" 3 K. n » 5 * 
3 ' = 

s. * 3 






3 

a 
n 



-- ;- - , * 

S S * * ^ 3 

IJJiu 

S s; - q. s;. * 

J ~" — • 1-1 !-■ 

! I S a f E 



j » J !T j ff 

r * I = =r« 









UlDTHl H 

IQUftERFED 
HIDTXi N 

\fi*'*mv>—. i-i^u pqs n giro * ikl m e*c Jt Ereo s bije hi 

llTCPGO IN/5T HE S CLHD N IHCL U EIM » Er*Q S SriE El 
urO« *i\m he n clhj 5 InCL u Finn; e eped E SIZE hi 
nfOfrsa Hire fee n GifU SIHCLu E.ret u EreO E Slffi ni 
n-WSO iUTO ra5 -s Gimj S WX u eiw; e EreG E SIZE Nl 
'huhsC iUTO ras b i;ihu s INCL u &rr. u Ereo £ SHE Nl 
utcroo tTAJL pre h urn s. ihcl u Erec n e™q s size. ei 

IPAIHATF 
MlDTtt, It 

l^nlrUT' JTDFSD FTE U G[FD N IWL N El*Ji If ErEO 5 SIZE Nl 
lITLFSd 4**0 P(E 5 GIFfl M INO. N EIW N EreO S SIZE El 
IITHSa ILlrft PH5 5 Clfn u jhCl H ETO u ereo N 'SIZE NJ 
iiTirsa w.lre PK 5 -r;iR: e jhCl M E«* E Fran n size nS 

IJTQFK1 14.1* MS H Giaf) N ]HCt S ET*G U FJttJ N HUE NJ 
KTORSO «.im FJ5 M DIRfl H 14D. 5 EflK E SIM *) 5IZE Nl 

■AtftD 

U I DIM : N 

ItaiFU ITEFP90 FOR U CJFfi N LNCL N Elflt N f.rta s SJZE Nl 

liTCra] IBUST P05 S CW N (NO. U ETOG M EfflD 5 Silt El 

itTrpsn iLLrti eos M Sim 5 ino. M ED8G u ETO E site ej 

ijtcrfiu tL:m oas m giro 5 i«o- h cnsc e oiw e f-i^ ei 

(lIORSfll tTA]L iPOS- N GIRD 5 INO. S Ott N WO 5 BlIE El 

tTCFSa 
MCDTH: h 

MStJSl 
UlDlMi M 

^ IttfJST WECK HE N GIRD N |NCL H ElfiG M P»0 5 51 IF N] 
l*CEK tHEM> HE S GLPfi S IMCL L ETBC S 61*0 E &LZE El 

tr*iL 

UlDlHi E 

ii [rifl. 

UlDlHj E 

FELAT i 0N!£ I 

IfLlW WPPER-LLhE HE N GLPO N |ICL « EWC M EtflO B 5IZS E* 
UUPPER-tll* IUHH-LJ* «S S GIRO N |HCt N EfBG N El«l £ 51 IE "> 

wmen-iito 

HIOdH: E 

■utra-Lini 

UIDTHi E 

PELATLCHS; 

itLBEH-lim tFI^LirB P0& N GIHO N I NCI H bhBG N ErtM 5 ^l?E X\ 
LlftfiELlMJ IPAU PUS 5 QiHJ " IKL N Fret h EreiJ £ -SIZE E* 

WlDTHi W 

l|P*U IPALH HE NN GiW rW LMtL 1*1 EP«C ¥1 EUSO l« S]7E IN J 
■ IKJU-n *I"|I*EFI 'H5 J= JJHO UJ INCL IA tlHU W EftJO NU SIZE EN) 
HWU i ir|r^-R =35 5E Ji I^D ■•« THTJ. hN EMit IV Erffl * &[2E NN] 
(IF* 1 JCIN&E* PCS 55 EIB3 T1 "P»"J NM EIEC W Er«l EN 5I« NN] 
■iF^- IflftGEH PCS BE LIHtl NN llHQL h« FWC 66 E>«1 EN SIH: l"l 
ilP*Ln tfJtfGER PD6 55 CIRC NM l«L * EHK £E Enffl NH SIZE NNI 



h 




ihobse 

UlDffll UU 

HEL*UtHSl 

l^ftjjT iTC^fl PQ5 W UjflO 1*1 |ICt W EHSG « W60 5S SIZE Wtl 

llfDflffil IBUST PM 55 q]RD NN INCL HU ftB." "H EflflD EE 5 1 IE E»l 

lITJlit lLtrfi ^DS KH GlflD SS LMCL U, EnBG EE EHED £E SJZE NM 

'i'OHSD tLITa ="GE hH GIRD 5S ENEL UU Ffflli W EI^JD EE 5IIE tm* 

WOIIGO tLi"B PtS EE UlHE S5 IMfJ. UU E«U" r EE £1*0 £€ SIZE UJ» 

UTOWi * ;"n re<=. ee cihl Sa l*tL uj Erec. uu Era £E SI re uji 

llTWffl *T*IL FtE WJ U-IKL l/j 1*^ US Ftf=t m EITFJ S& SIZE FN] 



\ 



fCDU 

UtOTHl NL| 
FEL*1 JlW9l 

cicnu irOPW "U5 n« nifln ■« tNCL Uj ErflC h« Ereu 95 SIZE Wi 

(iTaqsD i&jST pqfj 55 niHn *« jnll Uj EWC- uj Erec el iIZE SE> 

(ITCMEO ILi* PCS h*I nim S3 1N0L UJ EH!G EE FTCl EE SLZE ENh 

■■tT^flRn i i"H Fl c uj GUT R5 i^a uu ErBG lu Erw ff st^ enj 

itTOflSO *.1«& POE S£ GIRD 55 1HQ UU EnBG EE E?1£iD EE 55T-- fcN i 

lilSPSO *Lll*B F-DS 55 GIPD EE IHR. UU Fra; UU EHflD EC SIZE EN1 

lill*GQ *T*|L PIE NN GLPO S4 IICL ug FJBC NN tt*D S8 SIZE 5E1 



n 



VJlRsrce 
UtflTHs UJ 
F? AT ! "H-3 

[IGiHAFFE tTQHM TO 

(■naiEO ituST pus 55 

<flOHE0 P.lrtO PUS- UJ 
Itl^iU I.ITD FOG MM 
JfTWH] ILlI* POS &S 
UTEf^KI II. I m PK Yi 
HTCS90 WALL FTS NN 



(fH ulHD tri [NCI 

HIHD HH IPC NU 

C-IFL SS I*. JJ 

LIPO E5 IWl »W 

iMFf.i EE IHCL 1M 

CIHTJ 55 iNtL IM 

G!TO 53 IICL US 



UJ EITC MN tTBO E& 51ZE mi 

FrtV, NN ErflU tE StEC MJl 

FTfHT, EE L«3J EE 91 Z£ NU) 

FJW.- UU FJfEU EE 5]ZC NU1 

grtK FC FffflO EE 5JZE NU1 

PtflC AJ EHBD EE 3I2E NU1 

LMdLi W EUDD 55 SI?E E.HI 



rr 



UirjTH: Elf 

|IHUrvu-J IIJHSO PCS UU C-|Ffl UK INtL rfJ dtK MN ET1P1FJ 55 ^!ZC EH I 

ItTCHSO BtEAC POS Sft GIFT I* IKXW EfflCi NN ETOCl 55 Slli. EEJ 

I si.".:----. |. |fH li... .. I IT Kr. I".". ^.- -".. JJ ■' M- •■• ■': M 1 

itTDFtvn Hire FtT9 95 GIFO EE IICL W5 £i*C BE E*H)n f*J SIZE ENI 

KTC^SO ILIHB FTE NN GIRO NN IKHL 55 EfSj 4d E^DC l*J ^IfF fTil 

iirL**k «i_ i na ftp nn gshd nn na 55 ehbo ee ehbd w size m*i 



/ 



\ 



1-LT«E> 

UIDTHi EN 

ilh-r«EV VTCFSfl «K m* CUB HN lie NN ErflG »i EIOT 55 S]!E Mlt 

iflOIJC- 1HE« 3CU 51 ClltfJ HW IHCL NN El*C NN EHM 55 RIPE EE) 

itranso tLiiB fis as &ipo mi hcl us tree uvi etoh »j spze nni 

UlflflSO U-KHJ PK 5S r,IKLi EE IHCL US ETBC EE £«& MJ 8JZE NN1 

IH-^'-D I irfi PtB UJ r.lFO hH IHCL -S3 f*l« uy EWD NN SIZE ENi 

ItTHFED II 1 1* POS 1*1 ULRD NN I WLL SS CTK fE FCfli: UN ai^t ESI 



II 



KB1HICH 
MinTH: UJ 
PELA-T3nkfi- 

riBrra ircftSQ TOS uu. emu w incl. uj ETO& uj Erea 55 eIJE ujij 

li^QHSU 1EJJST PTJU 55 HIHD HH INtL. LU C"?^ UJ Eren bid &I2E UJS 

j-ense v.:r^ iTJ5 uj rriFtu aid incl uj Ert5G uu etb: iL ti a wJ' 1 

111*^50 U-l^« PO& UU T.IFD EE IHv_ ULl ErOT. FF ?fR1 EE SL ft Uj I 
ilTD TP ir*IL POS UJ GIRO 55 INCL K ETBG UJ EWU 55 SlJlE EE1 



lOT'E 

UIDTHi NH 

BEUTtCWSi 

irtlRO nauRn pre ui CIPD **N iNfj. uj Erau nn ErM S5 SIZF nm 
IIIi>SO *Bb5T FD5 55 UBU UN IMCL. NU EHBr> UJ £1*0 Sfi SIZE E"l 
IITrfTTl ILIT& P09 MU GIF-E 55 IPCL UJ ErK LU EITC Et 51 IE £Mr 
nrrjftsn n im Fob tu GIFE 55 \tJ. uu in& EE EiW E5 51^ tut 
uronsi i t ail ftk rn GLPO 55 mrt as EmG nN £060 S5 51 IE nn» 



Appendix: an implement at ion of the theory 

In an Ideal situation, a theory of visual Jn formation processing "would 
consist entirely of well o>fined, ninimsrribed result accompanied by proofs of existence' 
and UtUut'eness, w ilb enough bad ground Co show that the results obtained arc- in Tact those 
that are important for visual information processing. Mair (I97&a) labelled such theories 
Type L. If this were always the case, there would be considerable interest in, but no serious 
need for implementing the theory on a computer. In the present sr&te of the a,rt one is rarply 
so fonunate h slnce.even when ari individual nucule can be given a Type I theory, the 
interactions between it and other module* cannot he. jafiifadorily analyzed until the other 
modules have themselMeS been graced with Type I theories. This is to some extent the 
situation here The core nf 1he present theory is of Type I — the 3-D representation is we.ll- 
dHmed, Jnrt the image-space, processor is precisely formulated ~ but in analysing the 
Infractions between Che 3-D module itself and other Visual or non-visual processes Chat pass 
it clues, many different, hinds, of information have to be taken into account. 

It n therefore impotent to implement a theory such as rhls. and writing 
JIS impfemeniatlon has proved, an important technique Tot clarifying our ideas and testing 
different approaches to carrying 1 out a process. For example, algorithms for accessing Che 3- 
D model catalogue are peripheral Co the 3-D representation theory, but they are essential to a 
program that implements it, In our present implement si ion these altjonihr-ii Are qui re 
primitive, because the main focus of our attention has been nr dip im.ipre space processor 
and on relaxation mechanisms We huve postponed the development of a more 
psychological candidate for the catalogue indexing mechanisms until the important access 
paths into the catalogue arc more clearly defined. 

The purpose of this appendix on the implementation is therefore to 
clarify some of the concepts peripheral to the theory, and to lend substance 10 the notions jet 
QUI in it by exhibiting them at work. We make no claims that the implementation we 
describe here is optimal, and it is certainly not unique. 

Database convmlivjii 

Each 3-D model in out current implementation is organized around a 
special name such as lyu<Ht?upt4 r tlimh or even tOOOf, and we call these names titan*! 
(.(fcltar-nems}. Each fnaree specifies a memory location in the computer where the various 
shape informations associated with the particular 3-D model are stored, and the Iname iS 
used to reference that information. Many of the (names In this appendix, such as 
tquatfruprd or th*nt>, ate mnemonics for the shape the associated 3-D model represents 
This clarifies, the presentation,. bin is of no uchcr significance. 

Figure H exhibits the information contained in 3-D models currently 
listed in the restricted 3-D model catalogue used by our present implements.! t.nn. Each 3-D 
model, referenced by its Iname, has an associated width and a list of relations amon| its 
component ax.es. This list of relations specifies I he relative spatial dispositions of the 
components, and indicates a 3-D model for each one For example the first relation In the 
fprlmatt template is 

ftptlmnit two? p?f W $r<t ff tn<l /V mrff /V en&l $ slzt rV), 



FABLE Is . 

FftftrGsfln la t i on of arngSe and of p&dTipn 
Olrtction* and angles that occw In an adjunct relation ace e^pn/SSid in- 
a vocabulary q1 Symbol* that define a value and a tolerance. The I anger H>e 
symbol, the nore accuralely it *Uec-rftee a value. Tad lea la fi b tfefrns the valuee 
and tolerances ef all qijmbolj that occur in the figures* 



MS 



GIW 



IMCL 



EHBS 



IhK 


•-75 


t 


» 


WH*r 1I#H 


l.l 


l.l 


i.a 


* 


■ >• '•■■'- 


I 


1*25 


1,75 


* 


iDuir Mali 


IE (1 


JlS.i 


-13S-I 


-ii,i 


ijt'Bir 14*1 1 


l.l 


91.9 


laa.a 


-sa-i 


c-filir 


-*5-» 


Ulil 


135.1 


-IK. a 


itiiir ilk it 


*$.« 


lit. I 


* 


* 


wp«r Unit 


I.I 


a*, a 


IBB • 


* 


CWlIF 


* 


li.l 


IH.ff 


* 


l(w*r Mill 


U.l 


m.i 


-IJl.l 


-*S-P 


upp*r I knit 


1.1 


4i. a 


ica-i 


-99,1 


cinlw 


-*b.B 


»5.l 


ut.l 


-131. B 


ItK** Hkll 



Eif.n 



SIZE 



HIBTH 



l»H 


1.12 


1,32 


: aa 


weir ll«H 


I.U 


a. a- 


i-i 


1,5* 


c«ntir 


l.l 


a. at 


1,12 


I.U 


l*uir llatl 


i.a 


l,i 


I.E4 


4.41 


UApir Halt 


1.13 


fl,1B 


l.l 


2.?J 


iiji.'.r 


i.i 


1.22 


I.E. 


1, H 


Itair lib 11 


I.H 


1.21 


?.B5 


1. ?! 


upper Plait 


I.H 


1 11 


1.4 


l.U 


i*Ti1*f 


l.l 


I.U 


1.21 


1.6$ 


lu«»r I lpl| 



PCS 



Ein 



ma 



EP1H6 



EMD 



SC2E 



U1ETM 



TABLE lb 

W PM W 55 SE CE EIN 



142 


1,37 


a.s£ 


(,»f 


* 


* 


* 


1 


uppar 1 la II 


I.I 


i.z§ 


B.5, 


1.7b 


1.1 


* 


* 


» 


e***1#r 


* 


■MZ 


4.S7 


I-E2 


1.(7 


* 


• 


* 


1 »H«r 1 !■ 1 1 


J2..5 


87. B 


111. 5 


»7.5 


-1E7.S 


-112,5 


■4M 


-21. 1 


tippar 11*11 


•-• 


tt.i 


11. 1 


iJfi.t 


114.1 


-ns.i 


-HTr? 


•*[■,? 


.»■'*■ 


-22, & 


2?- 5 


i-. 1 : 


»7-5 


1S7.5 


-117.1 


-ilJ-l 


-6T,5 


Iinr Hull 


23,5 


H7 r 5 


113.S. 


157-5 


4 


* 


* 


* 


upp*r 1 1*1 1 


•■■ 


45. a 


EB.l 


IK. 8 


3BB.8 


♦ 


• 


* 


•;■■■■!■- 


* 


2?uS 


67. 5 


117-5 


l*7J 


■ 


* 


* 


'■ ■>*■ i '• ■ 


21,5 


IT.f 


113,6 


157.5 


.:■ E 


-41?-* 


-57,5 


-22-. E 


WMr I Inl 1 


M. 1.1 


45.1 


H.I 


L35.fi 


c -f 6 


-135.1 


-it* 


-+5.B 


tfnlpr 


-H.5 


22,5. 


6.' 5 


1IZ-5 


iiM 


-I!?.* 


-112 5 


-*T,5 


Iinr I Inll 



S3 R .BE EK M VU HI in 



B.83 




1.18 


I.J8. 


■r» 


l>2 


9.4.9 


I.IS 


bTip»r- Mall 


n.ro 


t-Bi 


LIT 


B.LJ 


t,2 


1,3? 


I5t 


I.H 


mmIbt 


I.I 


e.Bi 


B.« 


l.ll 


I, it 


■-Z5 


I.U 


1.81 


liklr- 1 IBM 


1.17 


1-29 


e.i7 


1.77 


1.2* 


Ml 


1,41 


B..7Sj 


Lrppir 1 IBM 


5.13 


I.Z? 


B.K 


I.B 


1.1 


I.G4 


1,7] 


i a 


tartar 


B.fl 


1,17 


1.2.8 


1.47 


*,» 


I.Zft 


2,11 


J-*3 


r«H«r Ntall 


I.IG 


• -It 


I.JI 


f.3] 


H 5| 


• rl* 


1,31 


B.1 


jppnr 1 -b1 1 


•-15 


1. 14 


l.u 


LSI 


1,1 


t.« 


IN 


l*ff 


ttnlir 


l.l 


I.BB 


1,11 


1.14 


131 


1,51 


I.H 


1,39 


Fnajr IIbjII 



Figure 15. The information supplied by earner vijual protest consists of a ^kttion of 
twc-dimcniEonal <-:i: • i!»"-: - ip:inn&. together with information about the Ihicknes-s associated 
with each, and their connectivity. The f-Kaiiple shown hcic has been MFnphfjed To include 
only the *rirks l^i the top level S-D moM?' The cellar name JflUC'tf 3i !hc ref^if-nre for n 
new 3-D model Hut will eventually recreant the shJipe or this stick figure. The FIGURE 
property oF fOQOO relates the organiiation found In the Image to the structure required of a. 
3-D model indicating syntactically that stick Li the top-level Single asis decnption of [he 
over Jill Shape, Stick I is the principat component of this shape, and stick i 2 through *? are its 
auxiliary aKeS. The table specifies the angular locations of the end-points of each of these 
sticks in a viewer-centred coordinate system, along with their thicknesses. 




102° 





13&8* 














FIGURCi 19 


(1] (2) f3> 14) (5) EG] 


(711 






PACKET: TRU£ 








Btick* 




end a 






end b 






tf 


9 


uidth 


a 


? 


M i d 


9 


£3.9 


-3,1 


4.S 


17, S 


2.8 


5.1 


1 


83-7 


-3.1 


3,e 


67.3 


Z.B 


3*1 


i' 


as. 3 


Z.fi 


5*2 


&3,3 


10.3 


B.4 


3 


83.9 


-3.G 


1,8 


32. S 


-2,9 


1.7 


<k 


K3.5 


-2,7 


1.3 


92.4 


-2 + + 


1.7 


5 


S7,e 


l.E 


1.9 


3G.E 


l.G 


1,3 


e 


87.1 


2.5 


1,9 


36. L 


?.4 


1,8 


7 


83.5 


-3,2 


0.G 


84.7 


-4.3 


a.v: 


notet si 1 


va 1 ues am i n 


dca r B«B 









which specifies the disposition nf the tt^riA cylinder relative to the I/uirndfr cylinder The 

f JirimnJ> cylinder IS the single cylinder iepl estnurion of the whale pTlmate shape, and ihe 
Ifewjn ii one of lis component cylinders Notice that Ifdtrfl is the dollar-name of another !s- 
D model. This IS how Ihe concatenation ruk between 3-D models K imple merited. 

The other mformaiion in this relation (emails oF attribute- value pairs, 
such: as poi W and firri W. These two pans specify Ihe position of the Jfarjo cylinder along 
the tpfimm* nu to be W (which means in the middle, between 0,25 and 0.7&); and the 
girdle-angle to be N (which means, within 15 degrees oF 0). The symbols N, W t 5, W/V. 
NNNW ttc. specify directions and tolerances, the longer .symbols specif ying a direction 
more precisely than the shorter ones. Table I Mines ihe values and tolerances oF all the 
symbols used here. 

The JfjTjc is (he principal axis oF the i^rrftlflf* 3-D model, and the 
remaining relailons held in the model Specify the depositions of the auxiliary axe? relative 
to it. Hcic. there are ssx auxiliary axes, the JArdrf, £tail t and four t limbs. 

The /prut of She inpuS 
The information supplied by earlier visual processes consists of a 
collection of two -dimensional stick descriptions, together with information about the 
thickness associated with each, and their connectivity. Figure 4 In the main test was 
obfaincd from a grey-level image using the techniques described by Marr (l^ob}, and it 
illustrates how information about *X**? may be obtained from an Image. Figure IS shows an 
example that has been simplified by omitting the embedding relations, The dollar name- 
fQQQO is ihe reference for a new 3-D model that will evenrually represent the shape of this, 
stick figure. The FIGURE property of $0000 relates the organization found in the image 
To the structure required of a 3-D model. The information held here indicates that stick is 
the lOp-k'.'-H single axis model For the overall *hape. stick I is the principal axii for the first 
elaborated J-D model and sticks 2 through 7 ate the auxiliary axes For this model. Tn a 
more derailed example, these auxiliary axes would themselves decompose into substructure*. 
It. might be the case, For example, lhat StlcV 2 (the Figure's bust] decomposed into two 
component sticks, corresponding to the neck and head If this added detail had been 
included in the input data, (2) would be replaced by (2 {6} (9» HI Ihe FIGURE property of 
tDOOO. 

Hcmdogf und She primary caivfojp.it 3-CitSi 

The Tirst step in the 3-D analysis oF such a figure is to select an 
appropriate 3-D model from the catalogue, and to match it to the incoming stick Figure (the 
two homology problems). This is done by computing estimates of the adjunct? between the 
principal axis or the ihcV figure and its auxiliaries, and (hen selecting that 3-D model whose 
adjunct relations are most simjlaT to the estimated ones. 

]f the radial coordinate* of the end pmnts of the sticks In figure 15 were 
Vnuwn. the Image-space processor could be used to compute the required adjunct relations 
directly. They are noli but Lt lurni out that useful relations can be obtained this way by 
first assuming that all the radial distances are the same, which is equivalent TO interpreting 



the image a* if all its stick* lay perpending r [Q the | tn e oF sight. This is the starting 
configuration for the 3-D model axes, and M the passing continues, betier value* for the 
radial coordinates will be established. 

Figure l& shows The result of translating this initial configuration into 

adjunct relation via the Image-space processor. Note that low resolution symbob have been 
USrd in the computed relations, find that new 3-D models f„ each auxiliary «*i s have freen 
treated. The girdle-afigles depend upon the particular choke of iero girdle direction w-hich 
is iitKwfrjr arbitrary. The only .mportant thing about them now ij that some oF the relations 
Hive gird £. and sDne haw gfrtf W, *hicb correspond "above" and «b e 1 pw - the principal 
axts The position parameter at this point is reasonably accurate, up (o possible reversal if 
(he wrong end of 1he principal axis has been laken as the tero end. 

Only positional values along the principal axis provide direct help for 
KlKlmg a 3-D model from Ihe catalogue. ,nd even th.; is subject to a possihle polarity 
error. The remaining parameter* do however providB indirect help, even though they may 
be severe!, distorted by the a pnari assumption that rhe stirti are coplanar For e,ampk 
although the inclinations, girdles and sizes may (hcmselves be incorrect, certain interrelations' 
among them w,ll be preserved; the nert will have a girdle angle or opposite sign to the 
legs, and The legs will all have similar inclinations, girdles and siies bsca use they are 
roughly paralrel. This information is sufficient to select the tqwdrnped model From the 
Jywdruprd. ford, tpjimatt and van Wf Sftnb models, and the relations in $0000 can now 
be associated wdh the relations held in tqundruptd. This information is inserted into the 
newly formed 3-D models under their TEMPLATE parties as shown in figure 1?. 

The next step is to use information in the tquadntped model to compute 
better estimates for the radii, beginning with the principal awls, stick I Figure EO of the 
mam text shows how our program achieves thLs. A hill-climbing: algorithm is used where 
the parameters (o be adjuued are the radial coordinate of one of stick Is end points and the 
rfiM-o girdle direc;mn of the $axis, which l,t, along st K * I, The t«*i: represents (he current 
attempt at mat<hin E the Stem to stick t. and as the taxis u incrementally rotated ihe 
goodness of fit oF the 3-D model IS COmpnred bj placing the frpw successively or. the 
Jforsfl relation* of the tqwiitupnt, ,nd accumulating a similarity score between the 
lipase s end-points and, the associated stirts in the image In the rop tow of figure 10 the 
end points of the jW are equidistant from the viewer For three successive orienunons of 
■:he zero-girdle direction. The "appearance^ of the tlfHtdruptd is wmpured one a*is it a 
time, and is shown in lighter lines in the figure. The effect of totaling about the t**ti does 
not significantly improve the fit. fn the bottom JW , the radial value of the end of the 
taxi; has been improved, and now rotation about the tteis leads to a good alignment This 
sets a new estimate For the radial coordinates oF stick I. and now r (he image-Space processor 
IS used to set new estimates For (he radiaf components of the remaining sticks, based on 
relations Stored in ^he lyu&rfTuptd. 



16800 

RELATIONS: 

(I0S00 <S06i pqs n giro e 
rmeeai 10002 pds 5 dihj e 

(tBB&l 10003 FDS M GtRU H 
<*0001 10004 PQ5 N GIRD W 
r 10001 10085 POS S G1RQ w 
[10301 I00P5 POS S GIRO U 
(10001 10007 POS N CtRO U 


LMCL M EI1BC fJ EHBD 5 SIZE Nl 
[NCL N EHBG 5 ErHD S SIZE N) 
INCl H ErTflG N EUBO S SIZE N] 
lNCL ft EMDG S ErlKl 5 SIZE N> 
IhrCL N EPIRfi S EMBD 5 SIZE Nt 
[NCL N Enat S EHBD S SIZE N> 

incl u ErEG s eneo s size ej 


WIDTH; W 

FIGURE: fB ill t2l I3> (41 (S) 

PACKET i TRUE 
T&nPLftTEj ICYLlNDEfl 


163 1731 


10001 

WIDTH: M 
FIGUREs 411 
PACKET: 1B0A0 
TEJ1PLAIE: ttYLiNQEfl 


10B05 
WIDTH: E 
FIGUHEt (51 

FAttETi 10000 

TErPLftTt: tCVLIWDER 


10002 
UIDTHt N 

FIGURE: 12J 
PACKET: 10009 
ireriPLATEt ICYL1NOER 


WlfJTW; E 
FIGURE: [61 
PACKET: 10000 
TEnPLSTEi 1CVLCNCEH 


■9003 
UlCrTHi E 

FIGURE: 131 
PACKET- 10000 
TEHPLATE: ICYUNDER 


10007 
. UlDTHi E 

FIGURE: (7j 
PACKET* 10000 
TEHPLATEt ICYUNDER 


10004 
UIDTHt E 
FIGURE: <4h 
PACKET: 10-000 
TERPLATEl ICYLENDEH 





Figure 16. The first step in the processing of the input information is the computation of a 

naodel-centred description of the sticks. Radial information is required in order to use the 
image space processor to compiiw this decriptioh but it ii not supplied m the Input. It turn* 
out however that useful relations can b* obtained by assuming that the radial distances to 
The **nd points of the Sticks its the same. ThLS. Is equivalent to assuming that all the Sticks 
of [he image are in a plane perpendicular to Che line of Eitght. The result of translating (his 
Initial configuration into adjunct relations via the image-space processor using this 
assumption IS Shown here. Note that low resolution symbols have been used in the 
computed relation J, and that new 3-D models for each auxiliary axis have been treated. 
The girdle-angles depend upon the particular choice of zero girdle direction, which is 
arbitrary initially. The only important thing about them now is that some of the relations 
have gird £, and some have gird W, which correspond io "above" and "below" the principal 
axLs. The position parameter at this point is reasonably accurate, up to possible reversal ir 
the wrong end of the prindp^r ;i*u has been liken aj the zero end. 



teeee 




. 


RELATIONS; 






(15900 10001 


POS H C[R[J E 


INCL H CMflG N EMflD 3 SIZE 14) 


110001 *G0&2 


POS S GIPD E 


INCL N EJBG 5 EMBO S SIZE N> 


(16001 18093 


POS N tJRTJ M 


INCL N EMSG N EMBO £ 5IZE W 


410001 iOet!4 


POS N GIRO U 


INCL N EH6G S EMEO 3 SIZE U) 


< 10001 10005 


POS S GIP.D U 


INCL N EtfBD 5 FfQD S SIZE Nl 


(tBBBi 1000s 


PO& 5 GIRD LJ 


INCL N Ftiec 5 EfrlSD S 5TZE N) 


(10001 10007 


PD5- N GIRD U 


1WCL-U EflBfJ £ EMBD S SIZE E) 


UIDTH: U 






FICWEt (8 (11 (21 


131 (4) ESI 


m i?ji 


PACKET t TRUE 






TEMPLATE: laUAJOflUPEQ 




M P . t L- i 




1980S 


UlDTHi N 




U1DTH, E 


FIGURE; 11! 




FIGURE: rSJ 


PACKET: 10.000 




PACKET e |0fffl0 


TEMPLATE [ ITCflSD 




TEMPLATE t ILIMB 


10002 




(Sees 


WIDTH: N 




UlDTHi E 


FlEUPEt (21 




FIGURE, W 


PACKET j 10009 




PACKET: 10000 


TD*lATEi *BUST 




TEMPLATE: tLTMB 


19993 




10M7 


WIDTH, E 




UTOTH: E 


FIGURE: (31 




FlCUHEt 171 


PACKET: I06ti0 




PACKET: I0P00 


TEMPLATE t tLllie 




TEMPLATE: HAIL 


10004 




■ 


U]DTH E E 




- 


FIGURE: [4] 






PACKET, t00Bti 






TEMPLATE: ILIMB 







Figure IT. The poiiticrral distnbutJon of tHe adjunct sticks (three a ppe nd»«i at each end of 

[he principal *xhJ. alan^ wj f h iimil^Jty refeiiom derived f*om the firrf. j n£ f, si n> and urt/* 
parameter! (four ippendage*. t*o on each end are verjr similar while a remain* one i* 
verjf d.f f trtnc). are uwd to S el*« a general !V-D model f tWn !he 3-D model nllW In Uil; 

9WVV. The Kcond tiomufogy l; als& earned, cm here aligning template properties to the 
«m>p«iffiti of $0000 mt relate adjunct relation! In SQQM-to adjunct relation! In 
fqiiJHtniptrf (these biter alignment; are not depleted here's 



teaee 












RELATIONS: 












((3300 


taBKi 


PCS NN C(RD IW 


INCL NN EMEG i -"':':' "■ r - " 


NNI 


(IBB91 


IS002 


POS $S GIRD NN 


fNCL NU EflBG 


m EMBTJ NN SIZE 


NU) 


(iPGBl 


18003 


P05 WJ GlflO 5& 


INa W EflBC EN ET1BD SS SIZE NU1 


tio&ei 


100-04 


POS W DlRfl SS 


]NOL UU EH0G 


U5 EflflD US SIZE WU> 


II000L 


10005 PQ5 SB GIRO 5$ 


1HCL W CMHG 


EN EHBD HN 5I2E 


NU> 


CIWS1 


|£0SG 


FQS S3 GIRD ££ 


TNCL LJU EM&C 


MS EHBD US SIZE 


NU] 


CIB0S1 


18697 


POS NN GIRD S5 


[NCL US EI1B& EN EMBO 1 55 SUE EN] 


UlDTHl NU 












FTGUP.E: (0 


(IP ,21 13} W '-• -.t- ■■■': 






PACKET] TRUE 










TEttPLAIEi 


rju^RLPFn 


■ 







Ft^ure l&. Wt we here the state af HW0O just after the completion of the rel&KitlOfl process 
de-pLctrd in fJfUrC 10 The adjunct relations have b«n recomputed by the image ipate 
processor usinjr symbols with a ilLghtty higher level oF resolution. 



eeee 




RELATIONS: 




(18800 18881 POS NN GIRD Nlf 1 NCL NH EF1BC 5S Eft0D SS SIZE 


NNI 


<!B0H1 18062 PQS SS GIRD NH [NO. NU EtIBC NN EH0O NN SIZE 


NW 


r 10881 (8883 PQS NN GIRD SB [NCL UU EtIBC EN EflBD SS SIZE 


NU? 


113001 I00B4 Pas NN GIRD SS 1 NCL UU EtIBC US EHBO US SIZE 


NWI 


110001 10005 POS SS GIRD SS [NCL UU EflBC EH EtIBD NN SIZE 


NWI 


113881 1889G PQS SS GIRD 55 [NCL UU EHBD US EfiBD US SIZE 


NU) 


41000] 10007 POS NN GIRD SS [NCL US EtIBC EN EH80 SS SIZE 


ENI 


UIOTHj NU 


■ 


FIGURE: 10 (11 (21 13) |4] f&> (5) (7) } 




PACKET] THUE 




TEWLATE: iGlftAFFE 





Figure t9. The adjunct rehtiont Jn figur-e I* are used again tg acwsi a 3-D model from the 
3-D model catalog up. This access results in the selection of the $ giraffe 3-D model, oa*ed 
largely Dn [he length* of tS? n«>( and 1rgS relative to the toiifi, »nd the first Stage of 

recagnitjqn if tompkte. 



SttQnitary catalogue acctsi art if JrtQtpitiW 
Having found I he 3-D moiel orLentalicn rtiit achieve* th-e h«t fit, we 
can rinw compute a new stf l4 Adjunct rfl.H inn i for t^OO, Che model thai is being buLlt. 
These are shown in figure 1 6. Motice thin ^we are now using svmboli wiih a slightly higher 
level of resolution. The 3-D model catalogue can now* he AC-ceited <n s-earch of a mor* 
ipficific shape. This. artess results in (he selection of tint $glmfft 3-D nwdel, based largely 
OT the lengths of the necV and legs rflaLive to [he 1orw, and the first stage of rpcognilion Is 
complete. The final state of the 3-D modei is shown in figure ft 

Atknwtedfmmlr. We thank Drew McDermoit, Tcmaso Poggio, and Kent Stevens for 
valuable criticism and Kaien Piendergast for preparing the drawings. This artlck 
describes wort reported in MA. T. A. I. Lab. Memo S41, and it was conducted at the 
ArtitiFial Intelligence Laboratory, a Massachusetts Institute of TeLhnylogr reseatch program 
supported in part by the Ad ■.■anted Research Projects Agency of the Department of Defense 
and monitored tf the Office of Naval Research, under Contract number NOQQU-TS-C-Ofifrl- 



Reftrentet 

Adrian, E. D. 1911 Afferent diKhargH to the cerebral cortex from peripheral sense organs. 
/. PAystol. (Land.). iOO, 159-191. 

Agin, C J. 19?2 Representation and description of curved object J, SUtufard A. J. Mtmo 1?? 

Adman. J. M., Kaas, j. H., Lan*> 1? H- (k Mpturt, F, M 1972 A repr?ieritfttlw of the visual 
Field, in ihe inferior nucleus or the pulvinar in the owl menhir (Aotui trtpfrgaius}. Brain 
Research. 40, 291-301 

Allman, J, M., Kaas. J H. & Lane. R. Ff l9?3 The middle temporal visual area (MT) In the 

bushbabv. Caiagv senegatensts. Brain Research, 57, 197-^tfi!. 

Allman, J. M. it Kaas h J, H, 1974a, The organisation of the second visual area (.V-II) in the 
owl monkey: a MCOrtd OTder transformation Ol' the visual hemifield. Brain Research, 16. 

247-265. 

Allman. J. M. Sc Kaas, J. H. I^T+b A visual area adjoining- the second visual arpa <V 'T|) or 
tbe medial will of parletO-OCCLpttal cortex Of the 0*1 monke* (Aotas trivir galas}. Anat. 
Ret. r I?&. 247-S. 

Allman, J. M & Kaai, J H 1974c A creKenl-shaped cot deal vis.ua! area surrounding the 
middle temporal area <MT) in the o«1 monkey (tot triviTgntuiL Brain Research, SI, 199- 

m. 

Bin ford, T. O. 1971 Visual perception by computer. Presented to the IEEE Conference on 
Systemi and Control, Miami, in December I9l\ 

Blum, H. 1973- Biological shape and vi^al science, (part I). /. tktur. Bid.. jtf, 2Q5-2S7. 

Brodmann, K. 1909 Vttgltithtftdt LikaiisafittnltMt dtr GmsAirnrtnde in tArtn PriTiiipien 
•d&rfjtitftlt tiuf Grand dei ZtittstbaUes. Lelpiig J. A. Barth. 

Cajal, S. Ramon y, 1311 rttstattgie riti \ftftv\t nervtux di t'hemme tt dts vtTttbTtt 1 vol*. 
Paris: Norbert Malaine. 

Cooper, L. A. St Shtpard. R. M. 1971 a Th< time required to prepare for a routed stimulus 
Memory and Cognition r 1. ?i6-?50. 

Cooper, L. A. Jt Shepasd, R. N. 1473 b Chronometric studies of the rotation of mental 

J truces. In: W. C. Chase (Ed,), Vittt&t fn/e* nation printing. New York: Academic Preis. 



Ctltchley* M. IBM Tfrn paritHtNettf. London- Edward Arnold. 

Dufflfki, A. M. fit aL 1969 Applications of artificial intelligence for chemical inference, II: 
Interpretation of low-r«o1uilon mass tp*crea of ketones, j Am Chtm. Sk., 91 1 2907- 2981. 

HolleTbacH K J. M- 19^5 Hierarchical shape description by selection and modification of 

proioiypes. M.I.T. Masttr'i Thetit, to uppm ai AM. T. A. i. Lab TR-?46. 

Kosslyn. i>. H. 1915 Information representation in wlaial Images, C&gnitive Fiycholtgf, 7, SHI- 
NTO. 

Luria. A. R. 1970 Traumatic aphaiit. The Hague: Moulion. 

Marr, D. I^Ga Attif idal Intelligence - a personal view. M. I. T. A. I Lot. Memo ?l 5 

Marr, O. 1976b Early processing of visual information. PMi. Tram. Ray. Soc. B..{lti the. 
press), 

Marr, D. I9fl&c. Analysis of occluding contour, M. 1. T. A. I. Lab. M4mo J?2. 

Marr, D. Be Fagpio. T. ]9^&a Cooperative computation of stereo disparity, Jcifncr, 
( submitted for pubttcattan). Also available as M. /. 7". /. f. Lab. Mtma 164. 

Marr, D. & Peggie. T 1976b FiOm understanding computation to understanding neural 
circuitry. In The Pirual Field: Psycho physics and Neurophysiology. Neurosaevces Re search 
Program Bulletin, E. Poepp-el et at., Eds. (in the press). 

Mrtilcr. j. St Shcpard. R. N. 1971 Transformational studies of toe internal representation of 
three-dimensional objects. In: Thmitt of iopiiliift ptychriigy: Tht Lvyoia SyTtiffCuiurn. Ed 
Hf. Sol jo. Hilkdale, N. j.: Lawrence Erlbaum Assoc. 

MJns>.y. M. 1975 A framework for repiesenting knowledge In: Tht piycholagy of computer 
vision, Ed. P. H. Winston, pp 211-277. New York: McGraw-Hill. 

Minsk p. M. Se Papert, S. I9T2 Artificial intelligence progress report. M. 1. T. A. I. Lab. 

Memo 252- 

Moscs. J. I97H MACSYMA - the fifth year. StGSAM Bulletin, ACM, S. lOG-IIG. See also 

The M ACSY M A reference manual, M. L. T. Laboratory for Computer Science, 545 
Technology Square, Cambridge, Mass. 0^119. 

Nevatia, R. 1971 Structured descraptjons of complex curved, objeeti for recognition and 
visual memory. Stanford A. I. Memo 250. 



Sch&nk, P C- J915 Conceptual iafem<7fim ptxxtiSlnf- New York: Elsevier. 

ShepiTd, R. N. ]*T5 Form, forma tioru und transformation of internal representation*. In-. 
fftfor m a titin pTte* i Wig a n d cognition: Tht Li^ura Spfl^e tl u fl h Ed. R. Solso. pp 67-122. 
Hillsdale N. J.: Laurence Irlbaum Ahw. 

Shepard, R. N, & Metlter, J. 1971 Mental rotation of lbr«Hdlmenilonal objects, Scitncr, \?l T 

Shorlltffe. E. H. 1916 Computer- Based Medical CHliEbfratienr MYCIN, Mew York: 
American Elsevier Publishing Company Inc. 

Street, R F- A 1931 A Gtstall completion tesl: A study of a croH-jecrkm of intellect In: 
Teachers Ccitrge Contributions ta Education-, No. iSf. Ne* York: Teatheri CoMege. 
Columbia University. 

Ullman, S. I97& Structure- from motion. (M. t.T. PL D Tfniii in preparation). 

Vatan, P. & Mair. D. 1916. Algorithm! for the decomposition of a contour, In perparation. 

Vinton, P. j. fr Bruyn, C- W. 1963 Edi. finmlboiX of Clinical Neurology.- Vol. 2. 
Localization tn Clinical Nturrftigf (fn »sjnci.i"i<in with A. Biemond). Amsterdam: North 
Holland Publishing Co. 

Warrington. E. K. &■ Taylor, A. M. I'SOS The contribution of the right parietal lobe to object 

recognition. Cortex, 9, ]52-i64. 

Abo remarks, mad* by L. ft- W. in a lecture given on Oct. 26th 1973 at the M. I. T. 

Psychology Department. 

Warrington, E. K. !975 The selective impairment oF semantic mrmory. Quart. J- txp. 

Psyctiel., 2? t m<ttr 

EeU, 5- M- 1971 Conical piO/Ctlom From two preitriate areas Ln the monkey. Bratn 
Restarth, ?*, 19-35. 

2eVi. S. M. 1973 Colour coding in rhesus monkey prestrlit* corte*. Brain Rt3*artA t 5.?. 422- 
427. 



